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CHAPTER 0 


Introduction 


These are the extended notes for the INF551 course which I taught at Ecole 
Polytechnique starting from 2019. The goal is to give a first introduction to the 
Curry-Howard correspondence between programs and proofs from a theoretical 
programmer’s perspective: we want to understand the theory behind logic and 
programming languages, but also to write concrete programs (in OCaml) and 
proofs (in Agda). Although most of the material is self-contained, the reader is 
supposed to be already acquainted with logic and programming. 


0.1 Proving instead of testing 


Most of the current software development is validated by performing tests: we 
run the programs with various values for the parameters, chosen in order to 
cover most branches of the program, and, if no bug has occurred during those 
executions, we consider that the program is good enough for production. The 
reason for this is that we consider that if the program uses “small” constants 
and “regular enough” functions then a large number of tests should be able 
to cover all the general behaviors of the program. Seriously following such a 
discipline greatly reduces the number of bugs, especially the simple ones, but we 
all know that it does not completely eliminates those: in some very particular 
and unlucky situations, problems still do happen. 

In mathematics, the usual approach is quite different. For instance, when 
proving a property P(n) over natural numbers, a typical mathematician will not 
test that P(0) holds, P(1) holds, P(2) holds, and so on, up to a big number, and, 
if the property is experimentally always verified, claim: “I am almost certain 
that the property P is always true”. He will maybe perform some tests in order 
to determine whether the conjecture is plausible or not, but in the end he will 
write down a proof, which ensures that the property P(n) is always satisfied, for 
eternity, even if someone makes a particularly unlucky or perverse choice for n. 
Proving instead of testing does require some extra work, but the confidence it 
brings to the results is incomparable. 

Let us present an extreme example of why this is the right way to proceed. 
On day one, our mathematician finds out using a formal computation software 


that 
/ sin(t) aan 
On day two, he tries to play around a bit with such formulas and finds out that 


is sin(t) sin(t/101) T 
dt = 
> t  ¢/loi 2 


On day three, he thinks maybe a pattern could emerge and discovers that 


a sin(t) sin(t/101) sin(t/201) 4, _t 
0 t t/101 t/201 2 
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On day four, he gets quite confident and conjectures that, for every n € N, 
i‘ Il sin(t/(100¢ + 1)) \ 4, _ 7 
o \sy t/(100¢ +1) = 
He then spends the rest of the year heating his computer and the planet, suc- 
cessfully proving the conjecture for increasing values of n. This approach seems 
to be justified since the most complicated function involved in here is the sine, 
which is quite regular (it is periodic), and all the constants are small (we get 
factors such as 100), so that if something bad ought to happen it will happen 


for a not-so-big value of n and testing should discover it. In fact, the conjecture 
breaks starting at 


n = 15 341 178 777 673 149 429 167 740 440 969 249 338 310 889 


and none of the usual tests would have found this out. There is a nice explana- 
tion for this which we will not give here, see [BB01, Bae18], but the moral is: if 
you want to be sure of something, don’t test it, prove it. 

On the computer science side, analogous examples abound where errors have 
been found in programs which were heavily tested. The number of such exam- 
ples have recently increased with the advent of parallel computing (for instance, 
in order to exploit all the cores that you have on your laptop or even your 
smartphone), where bugs might be triggered by some particular and very rare 
scheduling of processes. Already in the 70s, Dijkstra was claiming that program 
testing can be used to show the presence of bugs, but never to show their ab- 
sence! [Dij70], and the idea of formally verifying programs can even be traced 
back 20 years before that by, as usual, Turing [Tur49]. If we want to have soft- 
ware we can really trust (and not trust most of the time), we should move from 
testing to proving in computer science too. 

In this course, you will learn how to perform such proofs, as well as the 
theory behind it. Actually, the most complicated program we will prove correct 
here is a sorting algorithm and I can already hear you thinking “come on, we 
have been writing sorting algorithms for decades, we should know how to write 
one by now”. While I understand your point, I have two remarks to provide 
for this. Firstly, proving a more realistic program is only a matter of time 
(and experience): the course covers most of the concepts required to perform 
proofs, and attacking full-fledged code will not require new techniques, only 
patience. Secondly, in 2015, some researchers found out, using formal methods, 
that the default sorting algorithm (the one in the standard library, not some 
obscure library found on GitHub) in both Python and Java (not some obscure 
programming language) was flawed, and the bug had been there for more than 
a decade [dGdBB* 19]... 


0.2 Typing as proving 


But how can we achieve this goal of applying techniques of proofs to programs? 
It turns out that we do not even need to come up with some new ideas, thanks to 
the so-called proof-as-program correspondence discovered in the 1960s by Curry 
and Howard: a program is secretly the same as a proof! More precisely, in a 
typed functional programming language, the type of a program can be read as 
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a formula, and the program itself contains exactly the information required to 
prove this formula. This is the one thing to remember from this course: 


PROGRAM = PROOF 


This deep relationship allows the use of techniques from mathematics in order 
to study programs, but also can be used to extract computational contents from 
proofs in mathematics. 

The goal of this course is to give precise meaning to this vague description, 
but let us give an example in order to understand it better. In a functional 
language such as OCaml, we can write a function such as 
let comp f g x = g (Cf x) 
and the compiler will automatically infer a type for it. Here, it will be 
Cass: 1b) Cbs te Se Ca > 86h 
meaning that for any types ’a, ’b and ’c, 

— if f is a function which takes a value of type ’a as argument and returns 


a value of type ’b, 


— if g is a function which takes a value of type ’b as argument and returns 
a value of type ’c, and 

— if x is a value of type ’a, 

then the result is of type ’c. For instance, with the function succ of type 

int -> int (it adds one to an integer), and the function string_of_int of type 

int -> string (it converts an integer to a string), the expression 


comp succ string_of_int 2 


will be of type string (it will evaluate to "3"). Now, if we read -> as a logical 
implication =, the type can be written as 


(A> B) => (B=C)=(A=C) 


which is a valid formula. This is not by chance: in some sense, the program 
comp can be considered as a way of proving that this formula is valid. 

Of course, if we want to prove richer properties of programs (or use programs 
to prove more interesting formulas), we should use a logic which is more expres- 
sive than propositional logic. In this course, we will present dependent types 
which achieve this, while keeping the proof-as-program correspondence. For in- 
stance, the euclidean division, which computes the quotient and remainder of 
two integers, is usually given the type 


int -> int -> int * int 


stating that it takes two integers as arguments and returns a pair of integers. 
This typing is very weak, in the sense that there are many different functions 
which also have this type. With dependent types, we will be able to give it the 
type 


(m: int) > (n: int’) > U(q: int).U(r: int).((m = ng+r) x (0<r < |n)) 
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which can be read as the formula 


Ym € int.Vvn € int’.dq € int.dr € int.((m = ngt+r)A(0<r < |n})) 


and entirely specifies its behavior (here, int’ stands for the type of non-zero 
integers, division by zero being undefined). 


0.3 Checking programs 


In order to help formalize proofs with the aid of a computer people have de- 
veloped proof assistants such as Agda, Coq or Lean: those are programs which 
help the user to gradually develop proofs (which is necessary due to their typical 
size) and automatically check that they are correct. While those have progressed 
much over the years, in practice, proving that a program is correct still takes 
much more time than testing it (but, again, the result is infinitely superior). For 
this reason, they have been used in areas where there is a strong incentive to do 
it: applications where human lives (or large amounts of money) are involved. 

This technology is part of a bigger family of tools and techniques called 
formal methods whose aim is to guarantee the functioning of programs, with 
various automation levels and expressiveness. As usual, the more precise the 
invariants you are going to prove about your programs, the less automated you 
can be: 


abstract proof 
interpretation assistants 
(AbsInt, Astrée, ...) (Agda, Cog, ...) 
automation > expressiveness 
Hoare logic 
(Why3, ...) 


There is quite a number of industrial successes of uses of formal methods. For 
instance, the line 14 in Paris and the CDGVAL at Roissy airport have been 
proved using the B-method, Airbus is heavily using various formal tools (AbsInt, 
Astrée, CompCert, Frama-C), etc. We should also mention here the CompCert 
project, which provides a fully certified compiler for a (subset of) C: even though 
your program is proved to be bug-free, the compiler (which is not an easy piece 
of software) might itself be the cause of problems in your program... 

The upside of automated methods is that, of course, they allow for reaching 
one’s goal much more quickly. Their downside is that when they do not apply to 
a particular problem, one is left without a way out. On the other side, virtually 
any property can be shown in a proof assistant, provided that one is smart 
enough and has enough time to spend. 


0.4 Checking proofs 


Verifying that a proof is correct is a task which can be automatically and effi- 
ciently performed (it amounts to checking that a program is correctly typed); in 
contrast, finding a proof is an art. The situation is somewhat similar to analy- 
sis in mathematics, where differentiating a function is a completely mechanical 
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task, while integrating requires the use of many methods and tricks [Mun19]. 
This means that computer science can also be of some help to mathematicians: 
we can formalize mathematical proofs in proof assistants and ensure that no one 
has made a mistake. And it happens that mathematicians, even famous ones, 
make subtle mistakes. 

For instance, in the 1990s, the Fields medalist Voevodsky wrote a paper 
solving an important conjecture by Grothendieck [KV91], which roughly states 
that spaces are the same as strict oo-categories in which all morphisms are 
weakly invertible (don’t worry if you do not precisely understand all the terms 
in this sentence). A few years later, this was shown to be wrong because someone 
provided a counter-example [Sim98], but no one could exactly point out what 
was the mistake in the original proof. Because of this, Voevodsky thought 
for more than 20 years (i.e. even after the counter-example was found) that 
his proof was still correct. Understanding that there was indeed a mistake 
lead him to use proof assistants for all his proofs and, in fact, propose a new 
foundation for mathematics using logics, which is nowadays called homotopy 
type theory [Unil3]. Quoting him [Voe14]: 


I now do my mathematics with a proof assistant and do not have to 
worry all the time about mistakes in my arguments or about how to 
convince others that my arguments are correct. 


But I think that the sense of urgency that pushed me to hurry with 
the program remains. Sooner or later computer proof assistants will 
become the norm, but the longer this process takes the more misery 
associated with mistakes and with unnecessary self-verification the 
practitioners of the field will have to endure. 


As a much simpler example, suppose that we want to prove that all horses 
have the same color (sic). We show by induction on n the property P(n) = 
“every set of n horses is monochromatic”. For n = 0 and n = 1, the property is 
obvious. Now suppose that P(n) holds and consider a set H of n+1 horses. We 
can figure H as a big set, in which we can pick two distinct elements (horses) 
hy and hz and consider the sets H; = H \ {hz} and Hz = H \ {hy}: 


Ay HI, 


Ne? 


By induction hypothesis all the horses in H; have the same color and all the 
horses in Hz have the same color. Therefore, by transitivity, all the horses in H 
have the same color. Of course this proof is not valid, because we all know that 
there are horses of various colors (can you spot the mistake?). Formalizing the 
proof in a proof assistant will force you to fill in all the details, thus removing 
the possibility for potential errors in vague arguments, and will ensure that the 
arguments given are actually valid, so that flaws such as in the above proof 
will be uncovered. This is not limited to small proofs: large and important 
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proofs have been fully checked, such as the four color theorem (in Coq) in graph 
theory [Gon08], the Feit-Thompson theorem (in Coq) which is central in the 
classification of finite simple groups [GAA 13], a proof of the Kepler conjecture 
on dense sphere packing (in HOL light and Isabelle) [HAB*17], or results from 
condensed mathematics (the “liquid tensor experiment”, in Lean) [Sch22]. 


0.5 Searching for proofs 


Closely related to proof checking is proof search, or automated theorem proving, 
i.e. have the computer try by itself to find a proof for a given formula. For 
simple enough fragments of logic (e.g. propositional logic) this can be done: 
proof theory allows to carefully design efficient new proof search procedures. For 
richer logics, it quickly becomes undecidable. However, modern proof assistants 
(e.g. Coq or Lean) have so called tactics which can fill in some specific proofs, 
even though the logic is rich. For example, they are able to take care of showing 
boring identities such as (a + y) — x = y in abelian groups. 

Understanding proof theory allows us to formulate problems in a logical fash- 
ion and solve them. It thus applies to various fields, even outside theoretical 
computer science. For instance, McCarthy, a founder of Artificial Intelligence 
(the name is due to him!), was a strong advocate of using mathematical logic 
to represent knowledge and manipulate it [McC60]. Neural networks are admit- 
tedly more fashionable these days, but one never knows what the future will be 
made of. 

Although we will see some proof search techniques in this course, this will 
not be a central subject. The reason for this is that the main message is that 
we should take proofs seriously: since a proof is the same as a program, we are 
not interested in provability, but rather in proofs themselves, and proof search 
techniques give us very little control over the proofs they produce. 


0.6 Foundations 


At the beginning of the 20th century, some annoying paradoxes surfaced in 
mathematics, such as Russell’s paradox, motivating Hilbert’s program to provide 
an axiomatization on which all mathematics could be founded and show that 
this axiomatization is consistent: this is sometimes called the foundational crisis. 
Although Gédel’s incompleteness theorems established that there is no definite 
answer to this question, various formalisms have been elaborated, in which one 
can develop most of usual mathematics. One of the most widely used is set 
theory, as axiomatized by Zermelo and Fraenkel, but other formalisms have 
been proposed, such as Russell’s theory of types [WR12], where the modern 
type theory originates from: in fact, type theory can be taken as a foundation 
of mathematics. People usually see set theory as being more fundamental, since 
we see a type as representing a set (e.g. A > B is the set of functions from 
the set of A to the one of B), but we can also view type theory as being more 
fundamental since we can formalize set theory in type theory. The one you take 
for foundations is a matter of taste: are you more into chickens or into eggs? 
Type theory also provides a solid framework in which one can study basic 
philosophical questions such as: What is reasoning? What is a proof? If I know 
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that something exists, do I know this thing? What does it mean for two things 
to be equal? and so on. We could spend pages discussing those matters (and 
others have done so), but we rather like to formalize things, and we will see that 
very satisfactory answers to those questions can be given with a few inference 
rules. The meaning of life remains an open question, though. 

By taking an increasingly important part in our lives and influencing the way 
we see the (mathematical) world, these ideas even have evolved for some of us 
into some sort of religion based on the computational trinitarism, which stems 
from the observation that computation manifests itself in three forms [Har11]: 


categories 


a 


logic programming 


The aim of the present text is to explain the bottom line of the above diagram 
and leave categories for other books [Mac71, LS88, Jac99]._ Another closely 
related religion is constructivism, a doctrine according to which something can 
be accepted only if it can actually be constructed. It will play a central role in 
here, because programs precisely constitute a mean to describe the construction 
of things. 


0.7 In this course 


Asa first introduction to functional and typed languages, we first present OCaml 
in chapter 1, in which most example programs given here are written. We 
present propositional logic in chapter 2 (the proofs), \-calculus in chapter 3 (the 
programs), and the simply-typed variant -calculus in chapter 4 (the programs 
are the proofs). We then generalize the correspondence between proofs and 
programs to richer logics: we present first-order logic in chapter 5, and, in 
chapter 6, the proof assistant Agda, which is used in chapter 7 to formalize most 
important results in this book, and is based on the dependent types detailed 
in chapter 8. We finally give an introduction to the recent developments in 
homotopy type theory in chapter 9. 


0.8 Other references on programs and proofs 


Although we claim some originality in the treatment and the covered topics, 
this book is certainly not the first one about the subject. Excellent references 
include Girard’s Proofs and Types [Gir89], Girard’s Blind Spot [Girl1], Leroy’s 
College de France course Programmer = démontrer ? La correspondance de 
Curry-Howard aujourd’hui, Pierce’s Types and Programming Languages {Pie02], 
Sgrensen and Urzyczyn’s Lectures on the Curry-Howard isomorphism [SU06], 
the “HoTT book” Homotopy Type Theory: Univalent Foundations of Mathemat- 
ics (Unil3], Programming Language Foundations in Agda [WK19] and Software 
foundations [PdAC* 10]. 
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0.9 About this document 


This book was first published in 2020, and the version you are currently read- 
ing was compiled on October 26, 2023. Regular revisions can be expected if 
mistakes are found. Should you find one, please send me a mail at the address 
samuel .mimram@lix.polytechnique. fr. 


Reading on the beach. A printed copy of this course can be ordered from Ama- 
zon: https: //www. amazon. com/dp/BQ8C97TD9IG/. 


Color of the cover. In case you wonder, the color of the cover was chosen because 
it seemed obvious to me that 


program = proof = purple 


Code snippets. Most of the code shown in this book is excerpted from larger files 
which are regularly compiled in order to ensure their correctness. The process 
of extracting snippets for inclusion into 4TfX is automated with a tool whose 
code is freely available at https://github.com/smimram/snippetor. 


Thanks. Many people have (knowingly or not) contributed to the development 
of this book. I would like to particularly express my thanks to David Baelde, 
Olivier Bournez, Eric Finster, Emmanuel Haucourt, Daniel Hirschkoff, Stéphane 
Lengrand, Assia Mahboubi, Paul-André Melliés, Gabriel Scherer, Pierre-Yves 
Strub, Benjamin Werner. 

I would also like to express my thanks to the readers of the book who have 
suggested corrections and improvements: Eduardo Jorge Barbosa, Brian Berns, 
Alve Bjork, Florian Chudigiewitsch, Adam Dingle, Aran Donohue, Maximilian 
Doré, Leonid Dubinsky, Sylvain Henry, Chhi’méed Kiinzang, Yuxi Liu, Jeremy 
Roach, Kyle Stemen, Marc Sunet, Kenton Van, Yuval Wyborski, Uma Zalakain. 


CHAPTER 1 


Typed functional programming 


1.1 Introduction 


As an illustration of typed functional programming, we present here the OCaml 
programming language, which was developed by Leroy and collaborators, fol- 
lowing ideas from Milner [Mil78]. We present here some of the basics of the 
language both because it will be used in order to provide illustrative implemen- 
tations, and also because we will detail the theory behind it and generalize it in 
later chapters. This is not meant to be a complete introduction to programming 
in OCaml: advanced courses and documentation can be found on the website 
http: //ocaml.org/, as well as in books [CMP00, MMH13}. 

After a brief tour of the language, we present the most important construc- 
tions in section 1.2, and detail recursive types, which are the main way of con- 
structing types throughout the book, in section 1.3. In section 1.4, we present 
the ideas behind the typing system and the guarantees it brings. Finally, we 
illustrate how types can be thought of as formulas in section 1.5. 


1.1.1 Hello world. The mandatory “Hello world!” program, which prints Hello 
world!, can be written as follows: 


(* Our first program. *) 
print_endline "Hello, world!" 


This illustrates the concise syntax of the language (compared to Java for in- 
stance). Comments are written using (* ... *). Application of a function to 
arguments does not require parenthesis. The indentation is not relevant in pro- 
grams (unlike e.g. Python), but you are of course strongly encouraged to indent 
your programs nicely. 


1.1.2 Execution. The programs written in OCaml can be compiled to efficient 
native code by using ocamlopt, but there is also a “toplevel” which allows to 
interactively evaluate commands. It can be launched by typing ocaml, or utop 
if you want a fancier version. For instance: 


#2 235 
- : int =4 


We have typed 2 + 2, followed by ;; to indicate the end of our program. The 
toplevel then indicates that this is an integer (it is of type int) and that the 
result is 4. We call value an expression which cannot be reduced further: 2+2 
is not a value, whereas 4 is: the execution of a program consists in reducing 
expressions to values in an appropriate way (e.g. 2+2 reduces to 4). 
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1.1.3 A statically typed language. The OCaml language is typed, meaning 
that every term has a type indicating the kind of data it is. A type can be 
thought of as a particular set of values, e.g. int represents the set of integers, 
string represents the set of strings, and so on. In this way, the expressions 2+2 
and 4 have the type int (they are integers), and the function string_of_int 
which gives the string representation of an integer has type int -> string, 
meaning that it is a function which takes an integer as argument and returns a 
string. Moreover, typing is statically checked: when compiling a program, the 
compiler ensures that all the types match, and we use values of the expected 
type. For instance, if we try to compile the program 


let s = string_of_int 3.2 
the compiler will complain with 


Error: This expression has type float but an expression was 
expected of type int 


because the string_of_int function expects an integer whereas we have pro- 
vided a float number as argument. This discipline is very strict in OCaml (we 
sometimes say that the typing is strong): this ensures that the program will not 
raise an error during execution because an unexpected value was provided to a 
function (this is a theorem!). In other words, quoting Milner [Mil78]}: 


Well-typed programs cannot go wrong. 


Moreover, the types are inferred, meaning that the user never has to specify the 
types, they are guessed automatically. For instance, in the definition 


let f x =x +1] 


we know that the addition takes two integers as arguments and returns an 
integer: therefore x must be of type int and the compiler infers that f must be 
a function of type int -> int. However, if for some reason we want to specify 
the types, it is still possible: 


let f (x : int) : int =x+1 


1.1.4 A functional language. The language is functional, meaning that it 
has good support for defining functions and manipulating them just as any 
other value. For instance, suppose that we have a list 1 of integers, and we want 
to double all its elements. We can use the List.map function from the standard 
library, which is of type 


('a -> 'b) -> 'a list -> 'b list 


meaning that it takes as arguments a function f of type 'a -> 'b (here ‘a 
and 'b are intended to mean “any type”), and a list whose elements are of type 
‘a, and returns the list whose elements are of type 'b, obtained by applying f 
to all the elements of the list. We can then define the “doubled list” by 


let 12 = List.map (fun x -> 2 * x) l 


where we apply the function «+> 2 x x to every element: note that the function 
List.map takes a function as argument and that we did not even have to give 
a name to the function above by using the construction fun (such a function is 
sometimes called anonymous). 
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1.1.5 Other features. There are some other important features of the OCaml 
language that we mention only briefly here, because we will not use them much. 


References. As in most programming languages, OCaml has support for values 
that we can modify, here called references: they can be thought of as memory 
cells, from which we can retrieve the value and also change it. We can create 
a reference r containing x with let r = ref x, then we can obtain its contents 
with !r and change it to y with r := y. For instance, incrementing a counter 
10 times is done by 


let O = 
let r = ref @ in 
for i = @ to 9 do 
regs bal 
done 


Garbage collection. Unlike languages such as C, OCaml has a Garbage Collector 
which takes care of allocating and freeing memory. This means that freeing 
memory for a value which is not used anymore is taken care of for us. This 
prevents many common bugs such as writing in a part of memory which was 
freed or freeing a memory region twice by mistake. 


Other traits. In addition to the functional programming style, OCaml has sup- 
port for many other styles of programming including imperative (e.g. references 
described above), objects, etc. OCaml also has support for records, arrays, mod- 
ules, generalized algebraic data types, etc. 


1.2 Basic constructions 


1.2.1 Declarations. Values are declared with the let keyword. For instance, 
let x = 3 


declares that the variable x refers to the value 3. Note that, unless we explicitly 
use the reference mechanism, these values cannot be modified. An expression 
can also contain some local definitions which are only visible in the rest of the 
expression. For instance, in 


let x = 

let y = 2 in 

y*y 
the definition of the variable y is only valid in the following expression y * y. 

A program consists in a sequence of such definitions. In the case where a 

function “does not return anything”, such as printing, by convention it returns 
a value of type unit, and we often use the construction let () = ... in order 
to ensure that this is the case. For instance: 


let x = "hello" 


let () = print_string ("The value of x is " * x) 
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1.2.2 Functions. Functions are also defined using let definition, specifying 
the arguments after the name of the variable. For instance, 

let add x y=xt+y 

which would be of type 

int -> int -> int 

Note that arrows are implicitly bracketed to the right: this type means 

int -> (int -> int) 


Application of a function to arguments is written as the juxtaposition of the 
function and the arguments, e.g. 


let x = add 3 4 


(no need for parenthesis). Partial application is supported, meaning that we do 
not have to give all the arguments to a function (functions are sometimes called 
curried). For instance, the incrementing of an integer can be defined by 


let incr = add 1 


The value incr thus defined is the function which takes an argument y and 
returns add 1 y, so that the above definition is equivalent to 


let incr y = add 1 y 


This is in accordance to the bracketing of the types above: add is a function 
which, when given an integer argument, returns a function of type int -> int. 

As mentioned above, anonymous functions can be defined by the construc- 
tion fun x -> .... The add function could thus have been equivalently defined 
by 


let add = fun x y ->x ty 
or even 
let add x = fun y -> x+y 


Functions can be recursive, meaning that they can call themselves. In this case, 
the rec keyword has to be used. For instance, the factorial function is defined 
by 


let rec fact n = 
if n = @ then 1 else n * fact (n - 1) 


It is possible to define two mutually recursive functions f and g by using the 
following syntax: 


let rec f x=... 
and gXxXe=... 


This means that we can use both f and g in the definitions of f and g (see 
figure 1.4 below for an example). 


CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING 22 


1.2.3 Booleans. The type corresponding to booleans is bool, its two values 
being true and false. The usual operators are present: conjunction &&, dis- 
junction ||, and negation not. In order to test whether two values x and y 
are equal or different, one can use x = y and x <> y. They can be used in 
conditional branchings 


if ... then... else... 
or loops 
while ... do ... done 


Beware that the operators == and != also exist, but they compare values 
physically, i.e. check whether they have the same memory location, not if they 
have the same contents. For instance, using the toplevel, we have: 

# let x = ref Q;; 
val x : int ref = {contents = Q} 
# let y = ref Q;; 
val y : int ref = {contents = Q} 


# X = X;; 
- : bool = true 
# xX = y3; 
- : bool = true 
# X == X;; 
- : bool = true 
# X == y;; 
- : bool = false 


1.2.4 Products. The pair of x and y is written x,y. For instance, we can 
consider the pair 3,"hello" which has the product type int * string (it is a 
pair consisting of an integer and a string). Note that addition could have been 
defined as 


let add' (x,y) =x ty 
resulting in a slightly different function than above: it now has the type 
(int * int) -> int 


meaning that it takes one argument, which is a pair of integers, and returns an 
integer. This means that partial application is not directly available as before, 
although we could still write 


let incr' = fun y -> add (1,y) 


1.2.5 Lists. We quite often use lists in OCaml. The empty list is [], and 
x::l is the list obtained by putting the value x before a list 1. Most expected 
functions on lists are available in the module List. For instance, 


— @concatenates two lists, 


— List.length computes the length of a list, 


List.map applies a function to all the elements of a list, 


List.iter executes a function for all the elements of a list, 


List.mem tests whether a value belongs to a list. 
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1.2.6 Strings. Strings are writtenas "this is a string" and the related func- 
tions can be found in the module String. For instance, the function String. length 
computes the length of a string and String. sub computes a substring (at given 
indices) of a string. Concatenation is obtained by *. 


1.2.7 Unit. In OCaml, the type unit contains only one element, written (). 
As explained above, this is the value returned by functions which only have an 
effect and return no meaningful value (e.g. printing a string). They are also 
quite useful as an argument for functions which have an effect. For instance, if 
we define 


let f = print_string "hello" 


the program will write “hello” at the beginning of the execution, because the 
expression defining f is evaluated. However, if we define 


let f ( = print_string "hello" 


nothing will be printed because we define a function taking a unit as argument. 
In the course of the program, we can then use f () in order to print “hello”. 


1.3 Recursive types 


A very useful way of defining new data types in OCaml is by recursive types, 
whose elements are constructed from other types using specific constructions, 
called constructors. 


1.3.1 Trees. Asa first example, consider trees (more specifically, planar binary 
trees with integer labels) such as 


3 
LN 
4 1 
— 
1 3 
LN 
5 2 


Here, a tree consists of finitely many nodes which are labeled by an integer and 
can either have two children, which are themselves trees, or none (in which case 
they are called leaves). This description translates immediately to the following 
type definition in OCaml: 


type tree = 
| Node of int * tree * tree 
| Leaf of int 


This says that a tree is recursively characterized as being Node applied to a 
triple consisting of an integer and two trees or Leaf applied to an integer. For 
instance, the above tree is represented as 


let t = Node (3, Node (4, Leaf 1, Node (3, Leaf 5, Leaf 2)), Leaf 1) 
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Here, Node and Leaf are not functions (Leaf 1 does not reduce to anything), 
they are called constructors. By convention, constructor names have to begin 
with a capital letter, in order to distinguish them from function names (which 
have to begin with a lowercase letter). 


Pattern matching. Any element of the type tree is obtained as a constructor 
applied to some arguments. OCaml provides the match construction which 
allows to distinguish between the various possible cases of constructors and 
return a result accordingly: this is called pattern matching. For instance, the 
function computing the height of a tree can be implemented as 


let rec height t = 
match t with 
| Node (n, t1, t2) -> 1 + max (height t1) (height t2) 
| Leaf n -> @ 


Here, Node (n, t1, t2) is called a pattern and n, t1 and t2 are variables which 
will be defined as the values corresponding to the tree we are currently matching 
with, and could be given any other names. As another example, the sum of the 
labels in a tree can be computed as 


let rec sum t = 
match t with 
| Node (n, t1, t2) -> n + sum t1 + sum t2 
| Leaf n -> n 


It is sometimes useful to add conditions to matching cases, which can be done 
using the when construction. For instance, if we wanted to match only nodes 
with strictly positive labels, we could have used, in our pattern matching, a case 
of the form 


| Node (n, t1, t2) whenn>Q->... 


In case where multiple cases match, the first one is chosen. OCaml tests that 
all the possible values are handled in a pattern matching (and issues a warning 
otherwise). Finally, one can write 


let f = function ... 
instead of 
let f x = match x with ... 


(this shortcut was introduced because it is very common to directly match on 
the argument of a function). 


1.3.2 Usual recursive types. It is interesting to note that many (most) usual 
types can be encoded as recursive types. 


Booleans. The type of booleans can be encoded as 


type bool = True | False 
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although OCaml chose not to do that for performance reasons. A case construc- 
tion 


if b then e1 else e2 
could then be encoded as 


match b with 
| True -> el 
| False -> e2 


Lists. Lists are also a recursive type: 


type 'a list = 
| Nil 
| Cons of 'a * 'a list 


In OCaml, [] is a notation for Nil and x::1 a notation for Cons (x, 1). The 
length of a list can be computed as 


let rec length 1 = 
match 1 with 
| x::1 -> 1 + length 1 
| [1 -> @ 


Note that the type of list is parametrized over a type ’a. We are thus able to 
define, at once, the type of lists containing elements of type ’a, for any type ’a. 


Coproducts. We have seen that the elements of a product type 'a * 'b are pairs 
x , y consisting of an element x of type ’a and an element y of type ’b. We can 
define coproducts consisting of an element of type ’a or an element of type ’b 
by 


type ('a, 'b) coprod = 
| Left of ‘a 
| Right of 'b 


An element of this type is of the form Left x with x of type ’a or Right y with 
y of type ’b. For instance, we can define a function which provides the string 
representation of a value which is either an integer or a float by 


let to_string = function 
| Left n -> string_of_int n 
| Right x -> string_of_float x 


which is of type (int, float) coprod -> string. 


Unit. The type unit has () as the only value. It could have been defined as 


type unit = 
| T 


having () being a notation for T. 
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Empty type. The “empty type” can be defined as 


type empty = | 


ie. a recursive type with no constructor (we still need to write | for syntactical 
reasons). There is thus no values of that type. 


Natural numbers. Natural numbers (in unary notation) can be defined as 


type nat = 
| Zero 
| Suc of nat 


(any natural number is either 0 or the successor of a natural number) and 
addition as 


let rec add mn = 
match m with 
| Zero ->n 
| Suc m -> Suc (add m n) 


Of course, it would be a bad idea to use this type for heavy computations, 
and int provides access to machine binary integers (and thus natural numbers), 
which are much more efficient. 


1.3.3 Abstract description. As indicated before, a type can be thought of as 
a set of values. We would now like to briefly sketch a mathematical definition 
of the set of values corresponding to inductive types. 

Suppose fixed a set U, which we can think of as the set of all possible values 
an OCaml program can manipulate. We write P(U) for the powerset of U, 
i.e. the set of all subsets of U, which is ordered by inclusion. Any recursive 
definition induces a function F : P(/) > P(U) sending a set X to the set 
obtained by applying the constructors to the elements of X. For instance, with 
the definition tree of section 1.3.1, the induced function is 


F(X) = {Node(n,t1,t2) |n € N and t),t2 € X} U {Leaf(n) | ne N} 


The set associated to tree is intuitively the smallest set X C U which is closed 
under adding nodes and leaves, i.e. such that F(X) = X, provided that such a 
set exists. Such a set X satisfying F(X) = X is called a fixpoint of F. 

In order to be able to interpret the type of trees as the smallest fixpoint of F’, 
we should first show that such a fixpoint indeed exists. A crucial observation in 
order to do so is the fact that the function F' : P(U) > P(U) is monotone, in 
the sense that, for X,Y €U, 


X CY implies F(X) C F(Y). 
Theorem 1.3.3.1 (Knaster-Tarski [Kna28, Tar55]). The set 
fix(F) =(\{X € PW) | F(X) C X} 
is the least fixpoint of F’: we have 


F (fix(F)) = fix(P) 
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and, for every fixpoint X of F, 
fix(F’) C X 


Proof. We write C = {X € P(U) | F(X) C X} for the set of prefixpoints of F. 
Given X € C, we have 


fix(F) =()C CX (1.1) 
and therefore, since F' is increasing, 
F(fix(F)) C F(X) CX (1.2) 
Since this holds for any X € C, we have 
F(fix(F)) CC = fix(F) (1.3) 
Moreover, by monotonicity again, we have 
F(F(fix(F))) © F(fix(F)) 
therefore, F(fix(F’)) € C, and thus by (1.1) 
fix(F) C F(fix(F)) (1.4) 


From (1.3) and (1.4), we deduce that fix(F’) is a fixpoint of F. An arbitrary 
fixpoint X of F' necessarily belongs to C and, by (1.2), we have 


fix(F) = F(fix(F)) C X 


fix(F’) is thus the smallest fixpoint of F’. 


Remark 1.3.3.2. The attentive reader will have noticed that all we really used 
in the course of the proof was the fact that P(U) is a complete semilattice, 
i.e. we can compute arbitrary intersections. Under the more subtle hypothesis 
of the Kleene fixpoint theorem (P(U) is a directed complete partial order and F’ 
is Scott-continuous), one can even show that 


fix(F) = J F"@) 
nen 
i.e. the fixpoint can be obtained by iterating F’ from the empty set. In the case 
of trees, 
F°(0) =0 
F'(Q) = {Leaf(n) | n € N} 
F?(Q) = {Leaf(n) | n © N} U {Nodes(n,t1,t2) | n € N and t), tz € F'(0)} 


and more generally, F”() is the set of trees of height strictly below n. The 
theorem states that any tree is a tree of some (finite) height. 


Remark 1.3.3.3. In general, there are multiple fixpoints. For instance, for the 
function F’ corresponding to trees, the set of all “trees” where we allow to have 
an infinite number of nodes is also a fixpoint of F’. 
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As a direct corollary of theorem 1.3.3.1, we obtain the following induction prin- 
ciple: 


Corollary 1.3.3.4. Given a set X such that F(X) C X, we have fix(F’) C X. 
Example 1.3.3.5. With the type nat of natural numbers, we have 


F(X) = {Zero} U {Suc(n) | n € X} 
We have 
fix(F’) = {Suc” (Zero) | n € N} = {Zero, Suc(Zero), Suc(Suc(Zero)),...} 


In the following, we write 0 (resp. Sn) instead of Zero (resp. Succ(n)), and 
fix(F’) = N. The induction principle states that if X contains 0 and is closed 
under successor, then it contains all natural numbers. Given a property P(n) 
on natural numbers, consider the set 


X={neEN| P(n)} 


The requirement F(X) C X translates as P(0) holds and P(n) implies P(S'n). 
The induction principle is thus the classical induction principle for natural num- 
bers: 

P(0) > (Vn € N.P(n) => P(Sn)) > (Vn € N.P(n)) 


Example 1.3.3.6. Consider the type empty. We have F(X) = @ and thus 
fix(F) = @. The induction principle states that any property is necessarily 
valid on all the elements of the empty set: 


Va € 0.P(x) 


Exercise 1.3.3.7. Define the function F associated to the type of lists. Show that 
it also has a greatest fixpoint, distinct from the smallest fixpoint, and provide 
a concrete description of it. 


1.3.4 Option types and exceptions. Another quite useful recursive type 
defined in the standard library is the option type 


type 'a option = 


| Some of ‘a 
| None 


A value of this type is either of the form Some x for some x of type ’a or None. 
It can be thought of as the type ’a extended with the default value None and 
can be used for functions that, in some cases, do not return a value (in other 
languages such as C or Java, one would return a NULL pointer in this case). 
For instance, the function returning the head of a list is almost always defined, 
except when the argument is the empty list. It thus makes sense to implement 
it as the function of type 'a list -> 'a option defined by 


let hd 1 = 
match 1 with 
| x::1 -> Some x 
| £1 -> None 
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It is however quite cumbersome to use, because each time we want to use the 
result of this function, we have to match it in order to decide whether the result 
is None or not. For instance, in order to double the head of a list 1 of integers 
known to be non-empty, we still have to write something like 


match head 1 with 
| Some n -> 2*n 
| None ->@®  (* This case cannot happen *) 


See figure 1.2 for a more representative example. 


Exceptions. In order to address this, OCaml provides the mechanism of excep- 
tions, which are kinds of errors that can be raised and caught. For instance, in 
the standard library, the exception Not_found is defined by 


exception Not_found 


and the head function by 


let hd 1 = 
match 1 with 
| x::1 -> x 
| [] -> raise Not_found 


It now has type 'a list -> 'a, meaning that we can write 
2 * (hd 1) 


to double the head of a list 1. In the case where we take the head of the empty 
list, the exception Not_found is raised. We can catch it with the following 
construction if we need to: 


try 


with 
| Not_found -> ... 


1.4 The typing system 


We have already explained in section 1.1.3 that OCaml is a strongly typed 
language. We detail here some of the advantages and properties of such a 
typing system. 


1.4.1 Usefulness of typing. One of the main advantages of typed languages is 
that typing ensures their safety: if the program passes the typechecking phase, 
we are guaranteed that we will not have errors during execution caused by 
unexpected data provided to a function, see section 1.4.3. But there are other 
advantages too. 
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Documentation. Knowing the type of a function is very useful for documenting 
it: from the type we can generally deduce the order of the arguments of the 
functions, what it is returning and so on. For instance, the function Queue. add 
of the module implementing queues in OCaml has type 


"a -> 'a queue -> unit 


This allows us to conclude that the function takes two arguments: the first 
argument must be the element we want to add and the second one the queue in 
which we want to add it. Finally, the function does not return anything (to be 
more precise it returns the only value of the type unit): this must mean that 
the structure of queue is modified in place (otherwise, we would have had the 
modified queue as return value). 


Abstraction. Having a typing system is also good for abstraction: we can use a 
data structure without knowing the details of implementation or even having 
access to them. Taking the Queue. add function as example again, we only know 
that the second argument is of type 'a queue, without any more details on 
this type. This means that we cannot mess up with the internals of the data 
structure, and that the implementation of queues can be radically modified 
without us having to change our code. 


Efficiency. Static typing can also be used to improve efficiency of compiled pro- 
grams. Namely, since we know in advance the type of the values we are going to 
handle, our code can be specific to the corresponding data structure, and avoid 
performing some security checks. For instance, in OCaml, the concatenation 
function on strings can simply put the two strings together; in contrast, in a 
dynamically typed programming language such as Python, the concatenation 
function on strings has first to ensure that the arguments are strings, if they are 
not we will try to convert them as strings, and then we can put them together. 


1.4.2 Properties of typing. There are various flavors of typing systems. 


Dynamic vs static. The types of programs can either be checked during the ex- 
ecution (the typing is dynamic) or during the compilation (the typing is static): 
OCaml is using the latter. Static typing has many advantages: potential er- 
rors are found very early, without having to perform tests, it can help to op- 
timize programs, and provides very strong guarantees on the execution of the 
program. The dynamic approach also has some advantages though: the code is 
more flexible, the runtime can automatically perform conversions between types 
if necessary, etc. 


Weak vs strong. The typing system of OCaml is strong which means that it 
ensures that the values in a type are actually of that type: there is no implicit 
or dynamic type conversion, no NULL pointers, no explicit manipulation of 
pointers, and so on. By opposition, when those requirements are not met, the 
typing system is said to be weak. 
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Decidability of typing. A basic requirement of a typing system is that we should 
be able to decide whether a given term has a given type, i.e. we should have a 
type checking algorithm. For OCaml (and all decent programming languages) 
this is the case, and type checking is performed during each compilation of a 
program. 


Type inference. It is often cumbersome to have to specify the type of all the 
terms (or even to give many type annotations). In OCaml, the compiler per- 
forms type inference, which means that it automatically finds a type for the 
program, when there is one. 


Polymorphism. The types in OCaml are polymorphic, which means that they 
can contain variables which are treated as universally quantified. For instance, 
the identity function 


let id x = x 


has the type 'a -> 'a, which can also be read as the universally quantified type 
VA.A > A. This means that we can substitute ’a for any type and still get a 
valid type for the identity. 


Principal types. A program can admit multiple types. For instance, the identity 
function admits the following types 


‘'a -> 'a or int -> int or ('a -> 'b) -> ('a -> 'b) 


and infinitely many others. The first one 'a -> 'a is however “more general” 
than the others because all the other types can be obtained by substituting 'a 
by some type. Such a most general type is called a principal type. The type 
inference of OCaml has the property that it always produces a principal type. 


1.4.3 Safety. The programs which are well-typed in OCaml are safe in the 
sense that types are preserved during execution and programs do not get stuck. 
In order to formalize these properties, we first need to introduce a notion of 
reduction, which formalizes the way programs are executed. We will first do this 
on a very small (but representative) subset of the language. Most of the concepts 
used here, such as reduction or typing derivation, will be further detailed in 
subsequent chapters. 


A small language. We first introduce a small programming language, which can 
be considered as a very restricted subset of OCaml, that we will implement in 
OCaml itself. In this language, the values we use are either integers or booleans. 
We define a program as being either a value, an addition (p + p’), a comparison 
of integers (p < p’), or a conditional branching (if p then p’ else p”): 


type prog = 
| Bool of bool 
| Int of int 
| Add of prog * prog 
| Lt of prog * prog 
| If of prog * prog * prog 
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ny + ne > ny + ne 


Pi — pi 1) — py 

pi + po —> pi + po pi + po —> pi + pb 
ny < ng Ny S Ng 

ny < ng — true ny < ng —> false 
Pi — Pi p2 — ps 

pi < po — pl < pe pi < pp —> Pi < Pb 

if true then p; else pz — pi if false then p; else po — po 
pep. 


if p then p; else po —> if p’ then p; else po 


Figure 1.1: Reduction rules. 


A typical program would thus be 
if 3 < 2 then 5 else 1 (1.5) 


which would be encoded as the term 


If (Lt (Int 3 , Int 2) , Int 5 , Int 1) 


Reduction. Given programs p and p’, we write p —> p’ when p reduces to p’. 
This reduction relation is defined as the smallest one such that, for each of 
the rules listed in figure 1.1, if the relation above the horizontal bar holds, the 
relation below it also holds. In these rules, we write n; for an arbitrary integer 
and the first rule indicates that the formal sum of two integers reduces to their 
sum (e.g. 3 + 2 —> 5). 

An implementation of the reduction in OCaml is given in figure 1.2: given a 
program p, the function red either returns Some p’ if there exists a program p’ 
with p —> p’ or None otherwise. Note that the reduction is not deterministic, 
i.e. a program can reduce to distinct programs: 


5+ (5+ 4)¢- (3 + 2) + (5 + 4) 9 (3 + 2) +9 


The implementation provided in figure 1.2 chooses a particular reduction when 
there are multiple possibilities: we say that it implements a reduction strategy. 

A program is irreducible when it does not reduce to another program. It can 
be remarked that 


— values are irreducible, 


— there are irreducible programs which are not values, e.g. 3 + true. 
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(*x Perform one reduction step. *) 
let rec red : prog -> prog option = function 
| Bool _ | Int _ -> None 
| Add (Int n1 , Int n2) -> Some (Int (nl + n2)) 
| Add (p1 , p2) -> 
¢ 
match red p1 with 
| Some p1' -> Some (Add (p1' , p2)) 
| None -> 
match red p2 with 
| Some p2' -> Some (Add (pl , p2')) 
| None -> None 
) 
| Lt (Int n1_, Int n2) -> Some (Bool (nl < n2)) 
| Lt (pl , p2) -> 
¢ 
match red p1 with 
| Some p1' -> Some (Lt (pl1' , p2)) 
| None -> 
match red p2 with 
| Some p2' -> Some (Lt (pl , p2')) 
| None -> None 
) 
| If (Bool true , pl , p2) -> Some pl 
| If (Bool false , pl , p2) -> Some p2 
[abe CR gD sy pe yRe 
match red p with 
| Some p' -> Some (If (p' , pl , p2)) 
| None -> None 


Figure 1.2: Implementation of reduction. 
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neN 
- true : bool - false : bool EK nm:int 
F py: int F pg: int F py: int / po: int 
F py + po: int -F py < pg: bool 


- p: bool Fpy:A F pg: A 
Lif p then p; else po: A 


Figure 1.3: Typing rules. 


In the second case, the reason why the program 3 + true cannot be further 
reduced is that an unexpected value was provided to the sum: we were hoping 
for an integer instead of the value true. We will see that the typing system 
precisely prevents such situations from arising. 


Typing. A type in our language is either an integer (int) or a boolean (bool), 
which can be represented by the type 


type t = TInt | TBool 


We write | p: A to indicate that the program p has the type A and call it a 
typing judgment. This relation is defined inductively by the rules of figure 1.3. 
This means that a program p has type A when p: A can be derived using the 
above rules. For instance, the program (1.5) has type int: 


- 3:int -2:int 
- 3 < 2: bool F5:int Ff 1:int 
-F if 3 < 2 then 5 else 1: int 


Such a tree showing that a typing judgment is derivable is called a derivation 
tree. The principle of type checking and type inference algorithms of OCaml 
is to try to construct such a derivation of a typing judgment, using the above 
rules. In our small toy language, this is quite easy and is presented in figure 1.4. 
For a language with much more features as OCaml (where we have functions 
and polymorphism, not to mention objects or generalized algebraic data types) 
this is much more subtle, but still follows the same general principle. 

It can be observed that a term can have at most one type. We can thus 
speak of the type of a typable program: 
Theorem 1.4.3.1 (Uniqueness of types). Given a program p, if p is both of 
types A and A’ then A= A’. 


Proof. By induction on p: depending on the form of p, at most one rule applies. 
For instance, if p is of the form if po then p, else po, the only rule which 
allows typing p is 
F po : bool EF p,:A fF pg:A 
Lif po then py else po: A 
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exception Type_error 


(*x Infer the type of a program. *) 
let rec infer = function 
| Bool _ -> TBool 
| Int _ -> TInt 
Add (pl , p2) -> 
check p1 TInt; 
check p2 TInt; 
TInt 
Lt (pl , p2) -> 
check p1 TInt; 
check p2 TInt; 
TBool 
If (p , pl , p2) -> 
check p TBool; 
let t = infer pl in 
check p2 t; 
t 


(** Check that a program has a given type. *) 
and check p t = 
if infer p <> t then raise Type_error 


Figure 1.4: Type inference and type checking. 
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Since p; and pz admit at most one type A by induction hypothesis, p also does. 
Other cases are similar. 


As explained in section 1.4.2, full-fledged languages such as OCaml do not gen- 
erally satisfy such a strong property. The type of a program is not generally 
unique, but in good typing systems there exists instead a type which is “the 
most general”. 


Safety. We are now ready to formally state the safety properties guaranteed for 
typed programs. The first one, called subject reduction, states that the reduction 
preserves typing: 

Theorem 1.4.3.2 (Subject reduction). Given programs p and p’ such that p —> p’, 
if p has type A then p’ also has type A. 


Proof. By hypothesis, we have both a derivation of p —> p’ and + p: A. We 
reason by induction on the former. For instance, suppose that the last rule is 
Pi — Pi 
Pi + p2 —> pi + pr 


The derivation of  p: A necessarily ends with 


F py: int F po: int 


F py + po: int 


In particular, we have + p; : int and thus, by induction hypothesis, | p/ : int 
is derivable. We conclude using the derivation 


F pi: int + po: int 


tp + po: int 


Other cases are similar. 


The second important property is called progress, and states that the program 
either is a value or reduces. 


Theorem 1.4.3.3 (Progress). Given a program p of type A, either p is a value or 
there exists a program p’ such that p — p’. 


Proof. By induction on the derivation of F p: A. For instance, suppose that 


the last rule is 
F py: int F po: int 


F py + po: int 
By induction hypothesis, the following cases can happen: 
— p, —> p{: in this case, we have p, + po —> p + pao, 
— po, —>p: in this case, we have py + pp —> pi + pb, 


— p, and p2 are values: in this case, they are necessarily integers and p; + p2 
reduces to their sum. 


Other cases are similar. 
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The safety property finally states that typable programs never encounter errors, 
in the sense that their execution is never stuck: for instance, we will never try 
to evaluate a program such as 3 + true during the reduction. 


Theorem 1.4.3.4 (Safety). A program p of type A is safe: either 


— p reduces to a value v in finitely many steps 


Pp Pi > p2 aah es > Dn, > U 


— or p loops: there is an infinite sequence of reductions 


Pp Pi > p2 ano 


Proof. Consider a maximal sequence of reductions from p. If this sequence is 
finite, by maximality, its last element p’ is an irreducible program. Since p is of 
type A and reduces to p’, by the subject reduction theorem 1.4.3.2 p’ also has 
type A. We can thus apply the progress theorem 1.4.3.3 and deduce that either 
p’ isa value or there exists p” such that p’ —+ p’’. The second case is impossible 
since it would contradict the maximality of the sequence of reductions. 


Of course, in our small language, a program cannot give rise to an infinite 
sequence of reductions, but the formulation and proof of the previous theorem 
will generalize to languages in which this is not the case. The previous properties 
of subject reduction and progress are entirely formalized in section 7.1. 


Limitations of typing. The typing systems (such as the one described above or 
the one of OCaml) reject legit programs such as 


(if true then 3 else false) + 1 


which reduces to a value. Namely, the system imposes the requirement that the 
two branches of a conditional branching should have the same type, which is 
not the case here, even though we know that only the first branch will be taken, 
because the condition is the constant boolean true. We thus ensure that typable 
programs are safe, but not that all safe programs are typable. In fact, this has 
to be this way since an easy reduction to the halting problem shows that the 
safety of programs is undecidable as soon as the language is rich enough. 

Also, the typing system does not prevent all errors from occurring during 
the execution, such as dividing by zero or accessing an array out of its bounds. 
This is because the typing system is not expressive enough. For instance, the 
function 


let f x =1/ (x - 2) 
should intuitively be given the type 
{n:int |n #2} — int 


which states this function is correct as long as its input is an integer different 
from 2, but this is of course not a valid type in OCaml. We will see in chapters 6 
and 8 that some languages do allow such rich typing, at the cost of losing type 
inference (but type checking is still decidable). 
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1.5 Typing as proving 


We would now like to give the intuition for the main idea of this course, that 
programs correspond to proofs. Understanding in details this correspondence 
will allow us to design very rich typing systems which allow to formally prove 
fine theorems and reason about programs. 


1.5.1 Arrow as implication. As a first illustration of this, we will see here 
that simple types (such as the ones used in OCaml) can be read as propositional 
formulas. The translation is simply a matter of slightly changing the way we 
read types: a type variable ’a can be read as a propositional variable A and the 
arrow -> can be read as an implication =. Now, we can observe that there is a 
program of a given type (in a reasonable subset of OCaml, see below) precisely 
when the corresponding formula is true (for a reasonable notion of true formula). 
For instance, we expect that the formula 


A=A corresponding to the type "a ->’a 
is provable. And indeed, there is a program of this type, the identity: 
let id : 'a -> 'a = fun x -> x 


We have specified here the type of this function for clarity. We can give many 
other such examples. For instance, A = B => A is proved by 


let k : 'a -> 'b -> 'a = fun x y -> x 


The formula (A > B) > (B => C) = (A= C) can be proved by the composi- 
tion 


let comp : ('a -> 'b) -> ('b -> 'c) -> ('a -> 'c) = 
fun f g x -> g (f x) 


The formula (A > B > C) => (A= B) = (A= C) is proved by 


let s: ('a -> 'b -> 'c) -> ('a -> 'b) -> ('a -> 'c) = 
fun f g x -> f x (g x) 


and so on. 


Remark 1.5.1.1. In general, there is not a unique proof of a given formula. For 
instance, A = A can also be proved by 


fun x -> k x 3 
where k is the function defined above. 
1.5.2 Other connectives. For now, the fragment of the logic we have is very 


poor (we only have implication as connective), but other usual connectives also 
have counterparts in types. 
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Conjunction. A conjunction proposition A / B means that both A and B hold. 
In terms of types, the counterpart is a product: 

AAB corresponds to "a x ’b 
and we have programs implementing usual propositions such as A A B => A: 
let projl : ('a * 'b) -> 'a = fun (a , b) -> a 
or the commutativity of conjunction AA B=> BA A: 
let comm: ('a * 'b) -> ('b * 'a) = fun (a , b) ->b, a 


Truth. The formula T corresponding to truth is always provable and we expect 
that there is exactly one reason for which it should be true. Thus 


T corresponds to unit 


and we can prove A => T: 
let unit_intro : 'a -> unit = fun x -> () 
Falsity. The formula | corresponds to falsity and we do not expect that it can 


be proved (because false is never true). We can make it correspond to the empty 
type, which can be defined as a type with no constructor: 


type empty = | 
The formula L => A is then shown by 
let empty_elim : empty -> 'a = fun x -> match x with _ -> . 


(the “.” is a “refutation case” meaning that the compiler should ensure that 
this case should never happen, it is almost never used in OCaml unless you are 
doing tricky stuff such as the above). 


Negation. The negation can then be defined as usual by =A being a notation 
for A = 1, and we can prove the reasoning by contraposition 


(A= B) => (“B= —A) 
by 


let contr : ('a -> 'b) -> (('b -> empty) -> ('a -> empty)) = 
fun f g a -> g (f a) 


or A => ——A by 


let nni : 'a -> (('a -> empty) -> empty) = funaf->fa 
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Disjunction. A disjunction formula AV B can be thought of as being either A 
or B. We can implement it as a coproduct type, which is an inductive type 
where a value is either a value of type ’a or a value of type ’b, see section 1.3.2: 


type ('a , 'b) coprod = Left of 'a | Right of 'b 


We can then prove the formula AV B > BY A, stating that disjunction is 
commutative, by 


let comm : ('a , 'b) coprod -> ('b , 'a) coprod = fun x -> 
match x with 
| Left a -> Right a 
| Right b -> Left b 


or the distributivity A A (BV C) => (AA B)V (AAC) of conjunction over 
disjunction by 


let dist : ('a * ('b , 'c) coprod) -> ('a * 'b , 'a * 'c) coprod = 
fun (a , x) -> 
match x with 
| Left b-> Left (a, b) 
| Right c -> Right (a , c) 


or the de Morgan formula (=A V B) > (A => B) by 


let de_Morgan : ('a -> empty, 'b) coprod -> ('a -> 'b) = fun x a -> 
match x with 
| Left f -> empty_elim (f a) 
| Right b -> b 


1.5.3 Limitations of the correspondence. This correspondence has some 
limitations due to the fact that OCaml is, after all, a language designed to do 
programming, not logic. It is easy to prove formulas which are not true if we 
use “advanced” features of the language such as exceptions. For instance, the 
following “proves” A => B: 


let absurd : 'a -> 'b = fun x -> raise Not_found 


More annoying counter-examples come from functions which are not terminating 
(i.e. looping). For instance, we can also “prove” A => B by 


let rec absurd : 'a -> 'b = fun x -> absurd x 
Note that, in particular, both allow to “prove” L: 
let fake : empty = absurd () 


Finally, we can notice that there does not seem to be any reasonable way to 
implement the classical formula —A V A (apart from using the above tricks), 
which would correspond to a program of the type 


('a -> empty , 'a) coprod 


In next chapters, we will see that it is indeed possible to design languages in 
which a formula is provable precisely when there is a program of the corre- 
sponding type. Such languages do not have functions with “side-effects” (such 
as raising an exception) and enforce that all the programs are terminating. 


CHAPTER 2 


Propositional logic 


In this chapter, we present propositional logic: this is the fragment of logic 
consisting of propositions (very roughly, something which can either be true or 
false) joined by connectives. We will see various ways of formalizing the proofs 
in propositional logic — with a particular focus on natural deduction — and study 
the properties of those. We begin with the formalism of natural deduction in 
section 2.2, show that it enjoys the cut elimination property in section 2.3 and 
discuss strategies for searching for proofs in section 2.4. The classical variant 
of logic is presented in section 2.5. We then present two alternative logical 
formalisms: sequent calculus (section 2.6) and Hilbert calculus (section 2.7). 
Finally, we introduce Kripke semantics in section 2.8, which can be considered 
as an intuitionistic counterpart of boolean models for classical logic. 


2.1 Introduction 


2.1.1 From provability to proofs. Most of you are acquainted with boolean 
logic based on the booleans, which we write here as 0 for false, and 1 for true. 
In this setting, every propositional formula can be interpreted as a boolean, 
provided that we have an interpretation for the variables. The truth tables for 
usual connectives are 


AAB|0 1 AVB|0 1 A=>B!/0 1 
0 0 0 0 0 1 0 1 1 
1 0 1 1 Ld. 1 0 1 


For instance, we know that the formula A => A is valid because, for whichever 
interpretation of A as a boolean, the induced interpretation of the formula is 1. 

We have this idea that propositions should correspond to types. Therefore, 
rather than booleans, propositions should be interpreted as sets of values and 
implications as functions between the corresponding values. For instance, if 
we write N for a proposition interpreted as the set of natural numbers, the 
type N = N would correspond to the set of functions from natural numbers to 
themselves. We now see that the boolean interpretation is very weak: it only 
cares about whether sets are empty or not. For instance, depending on whether 
X is empty (0) or non-empty (=), the following table indicates whether the set 
X — X of functions from X to X is empty or not: 


A>B| 9 -# 


9 |-0 -0 
-0 0 —-0 


Reading @) as “false” and =) as “true”, we see that we recover the usual truth 
table for implication. In this sense, the fact that the formula N => N is true only 
shows that there exists such a function, but in fact there are many such func- 
tions, and we would be able to reason about the various functions themselves. 
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Actually, the interpretation of implications as sets of functions is still not 
entirely satisfactory because, given a function of type N > N, there are many 
ways to implement it. We could have programs of different complexities, using 
their arguments in different ways, and so on. For instance, the constant function 
z++ 0 can be implemented as 


let f x =@ 
or 


let f x =x - x 


or 
let rec f x = if x = @ then x else f (x - 1) 


respectively in constant, logarithmic and linear complexity (if we assume the 
predecessor to be computed in constant time). We thus want to shift from 
an extensional perspective, where two functions are equal when they have the 
same values on the same inputs, to an intentional one where the way the re- 
sult is computed matters. This means that we should be serious about what 
is a program, or equivalently a proof, and define it precisely so that we can 
reason about the proofs of a proposition instead of its provability: we want to 
know what the proofs are and not only whether there exist one or not. This 
is the reason why Girard advocates that there are three levels for interpreting 
proofs [Girl1, section 7.1]: 


0. the boolean level: propositions are interpreted as booleans and we are 
interested in whether a proposition is provable or not, 


1. the extensional level: propositions are interpreted as sets and we are in- 
terested in which functions can be implemented, 


2. the intentional level: we are interested in the proofs themselves (and how 
they evolve via cut elimination). 


2.1.2 Intuitionism. This shift from provability to proofs was started by the 
philosophical position of Brouwer starting in the early twentieth century, called 
intuitionism. According to this point of view, mathematics does not consist 
in discovering the properties of a preexisting objective reality, but is rather a 
mental subjective construction, which is independent of the reality and has an 
existence on its own, whose validity follows from the intuition of the mathe- 
matician. From this point of view 


— the conjunction A A B of two propositions should be seen as having both 
a proof of A and a proof of B: if we interpret propositions as sets, A A B 
should not be interpreted as the intersection AM B, but rather are the 
product A x B, 


— a disjunction AV B should be interpreted as having a proof of A or a proof 
of B, i.e. it does not correspond to the union AU B, but rather to the 
disjoint union AU B, 


— an implication A = B should be interpreted as having a way to construct 
a proof of B from a proof of A, 
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—a negation ~A = A = L should be interpreted as having a counter- 
example to A, i.e. a way to produce an absurdity from a proof of A. 


Interestingly, this led Brouwer to reject principles which are classically valid. 
For instance, according to this point of view ——A should not be considered as 
equivalent to A because the implication 


ASA 


should not hold: if we can show that there is no counter-example to A, this 
does not mean that we actually have a proof of A. For instance, suppose that 
I cannot find my key inside my apartment and my door is locked: I must have 
locked my door so that I know that my key is somewhere in the apartment and 
it is not lost, but I still cannot find it. Not having lost my key (i.e. not not 
having my key) does not mean that I have my key; in other words, 


a7Key => Key 


does not hold (explanation borrowed from Ingo Blechschmidt). For similar 
reasons, Brouwer also rejected the excluded middle 


AAVA 


given an arbitrary proposition A: in order to have a proof for it, we should 
have a way, whichever the proposition A is, to produce a counter-example to it 
or a proof of it. Logic rejecting these principles is called intuitionistic and, by 
opposition, we speak of classical logic when they are admitted. 


2.1.3 Formalizing proofs. Our goal is to give a precise definition of what a 
proof is. This will be done by formalizing the rules using which we usually 
construct our reasoning. For instance, suppose that we want to prove that the 
function «+> 2 x x is continuous in 0: we have to prove the formula 


Ve.(e > 05> 4n.(n > OAV2.|2| < 7 => |22| < €)) 


This is done in the following steps, resulting in the following transformed for- 
mulas to be proved. 


— Suppose given ¢, we have to show: 


E>0>54n(7 > O0AVa.|2| <7 => |22| < €) 


Suppose that ¢ > 0 holds, we have to show: 


An.(n > OAVa.|2| <7 => |22| < €) 
— Take 7 = ¢/2, we have to show: 
e/2>0AVa.\a2| < ¢/2 > |2a| <e€ 


— We have to show both 


e/2>0 and = Va.|a| < e/2 = |2a| <eé 
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— For e/2 > 0: 


— because 2 > 0, this amounts to showing (¢/2) x 2 > 0 x 2, 
— which, by usual identities, amounts to showing « > 0, 
— which is an hypothesis. 

— For Va.|a| < ¢/2 => |22| < e: 


— suppose given x, we have to show: |z| < ¢/2 => |22| <e, 

— suppose that |x| < ¢/2 holds, we have to show: |22| < «, 

— since 2 > 0, this amounts to showing: |2a|/2 < ¢/2, 

— which, by usual identities, amounts to showing: |x| < ¢/2, 

— which is an hypothesis. 
Now that we have decomposed the proof into very small steps, it seems possible 
to give a list of all the generic rules that we are allowed to apply in a reasoning. 


We will do so and will introduce a convenient formalism and notations, so that 
the above proof will be written as: 


e>0,|2| < ¢/2 |2| < €/2 
€>0,|2| < ¢/2 |2a|/2 < €/2 


e>Oke>0 é>0,|2| <¢/2+ |2a| <e 
€>0F (e/2)x2>0x2 é>0F |a| <¢/2 => [22] <e 
e>OFe/2>0 eE>OFVa.|a| < 6/2 => |2a|<e 


E>OFe/2>0AVa.\2| < ¢/2 > |2a| <e 
E>OF An(n > O0AV2.|2| < 7 > |22| < €) 
Fe >0>47.(7 > 0AVz.|a| <1 => |22| <) 
F Ve(e >0=> 4n(n > OAVa.|2| <7 => |22| < €)) 


(when read from bottom to top, you should be able to see the precise corre- 
spondence with the previous description of the proof). 


2.1.4 Properties of the logical system. Once we have formalized our log- 
ical system we should do some sanity checks. The first requirement is that it 
should be consistent: there is at least one formula A which is not provable (oth- 
erwise, the system would be entirely pointless). The second requirement is that 
typechecking should be decidable: there should be an algorithm which checks 
whether a proof is valid or not. In contrast, the question of whether a formula 
is provable or not will not be decidable in general and we do not expect to have 
an algorithm for that. 


2.2 Natural deduction 


Natural deduction is the first formalism for proofs that we will study. It was 
introduced by Gentzen [Gen35]. We first present the intuitionistic version. 
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2.2.1 Formulas. We suppose fixed a countably infinite set V of propositional 
variables. The set A of formulas or propositions is generated by the following 
grammar 
A,Bz=X|AS>BI|AAB|T|AVB|L|AA 

where X is a propositional variable (in 4) and A and B are propositions. They 
are respectively read as a propositional variable, implication, conjunction, truth, 
disjunction, falsity and negation. By convention, — binds the most tightly, then 
A, then V, then =>: 


A“AVBAC=D reads as ((7A) V(BAC)) > D 
Moreover, all binary connectives are implicitly bracketed to the right: 
Ai \ Ag\A3>B>C reads as (Ai A (Az A A3)) > (B > C) 


This is particularly important for =, for the connectives A and V the other 
convention could be chosen with almost no impact. We sometimes write A = B 
for (A => B)A(B= A). 

A subformula of a formula A is a formula occurring in A. The set of subfor- 
mulas of A can formally be defined by induction on A by 


Sub(X) = {X} Sub(A > B) = {A> B} USub(A) U Sub(B) 


Sub(T) = {T} Sub(A A B) = {AA B} USub(A) U Sub(B) 
Sub(L) = {1} Sub(A V B) = {AV B} USub(A) U Sub(B) 
Sub(—A) = {A} USub(A) 


2.2.2 Sequents. A context 
T=A,,...,An 
is a list of propositions. A sequent, or judgment, is a pair 
TFA 


consisting of a context T and a variable A. Such a sequent should be read as 
“under the hypothesis in I’, I can prove A” or “supposing that I can prove the 
propositions in I, I can prove A”. The comma in a context can thus be read 
as a “meta” conjunction (the logical conjunction being A) and the sign + as a 
“meta” implication (the logical implication being =). 

Remark 2.2.2.1. The notation derives from Frege’s Begriffsschrift [Fre79], an 
axiomatization of first-order logic based on a graphical notation, in which logical 
connectives are drawn by using wires of particular shapes: the formulas —A, 
A= B and Vz.A are respectively drawn as 


—A —A =o A 
'— B 
In this system, given a proposition drawn —— A , the notation/-— A means 


that A is provable. The assertion that (Vz.A) = (Ax.B) is provable would for 


instance be written 
x A 
sae 


(in classical logic, the formula 4z.B is equivalent to -Vz.4B). The symbol + 
used in sequents, as well as the symbol - for negation, originate from there. 
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Tarra) 
TRHASB TFA [T,AFB 
TFB (+8) Traspey 
PrAAB PRAAB AL TRA TrBY 
TFA - TRB ‘® TEAAB : 
= 
per) 
THAVB  T,AFKC NBEO. ThA vt) TRB vs) 
TEC : TFAVB:’ TRFAVB*! 
Deals, 
Tea! BE) 
TRSA TRA T,ARL 
ree roy a 
Figure 2.1: NJ: rules of intuitionistic natural deduction. 
2.2.3 Inference rules. An inference rule, written 
Tih A ad: Rea, 
uae (2.1) 
TRA 


consists of n sequents [; + A;, called the premises of the rule, and a sequent 
[- A, called the conclusion of the rule. We sometimes identify the rules by a 
name given to them, which is written on the right of the rule. Some rules also 
come with external hypothesis on the formulas occurring in the premises: those 
are called side conditions. There are two ways to read an inference rule: 


— the deductive way, from top to bottom: from a proof for each of the 
premises [; + A; we can deduce [+ A, 


— the inductive or proof search way, from bottom to top: if we want to prove 
['- A by that inference rule we need to prove all the premises [; + Aj. 


Both are valid ways of thinking about proofs, but one might be more natural 
than the other one depending on the application. 


2.2.4 Intuitionistic natural deduction. The rules for intuitionistic natural 
deductions are shown in figure 2.1, the resulting system often being called NJ (N 
for natural deduction and J for intuitionistic). Apart from the aziom rule (ax), 
each rule is specific to a connective and the rules can be classified in two fam- 
ilies depending on whether this connective appears in the conclusion or in the 
premises: 
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— the elimination rules allow the use of a formula with a given connec- 
tive (which is in the formula in the leftmost premise, called the principal 
premise), 


— the introduction rules construct a formula with a given connective. 


In figure 2.1, the elimination (resp. introduction) rules are displayed on the left 
(resp. right) and bear names of the form (...g) (resp. (...r)). 

The axiom rule allows the use of a formula in the context [: supposing that 
a formula A holds, we can certainly prove it. This rule is the only one to really 
make use of the context: when read from the bottom to top, all the other rules 
either propagate the context or add hypothesis to it, but never inspect it. 

The introduction rules are the most easy to understand: they allow proving 
a formula with a given logical connective from the proofs of the immediate 
subformulas. For instance, (Az) states that from a proof of A and a proof of B, 
we can construct a proof of AA B. Similarly, the rule (= ) follows the usual 
reasoning principle for implication: if, after supposing that A holds, we can 
show B, then A => B holds. 

In contrast, the elimination rules allow the use of a connective. For instance, 
the rule (+x), which is traditionally called modus ponens or detachment rule, 
says that if A implies B and A holds then certainly B must hold. The rule 
(Vg) is more subtle and corresponds to a case analysis: if we can prove AV B 
then, intuitively, we can prove A or we can prove B. If in both cases we can 
deduce C then C must hold. The elimination rule (Lg) is sometimes called ex 
falso quodlibet or the explosion principle: it states that if we can prove false 
then the whole logic collapses, and we can prove anything. 

We can notice that there is no elimination rule for T (knowing that T is 
true does not bring any new information), and no introduction rule for L (we 
do not expect that there is a way to prove falsity). There are two elimination 
rules for A which are respectively called left and right rules, and similarly there 
are two introduction rules for V. 


2.2.5 Proofs. The set of proofs (or derivations) is the smallest set such that 
given proofs 7; of the sequent [; + A;, for 1 <7 <n, and an inference rule of 
the form (2.1), there is a proof of [+ A, often written in the form of a tree as 


Ty Tn 
T,F A, _ T,F Ay 
TFA 


A sequent [+ A is provable (or derivable) when it is the conclusion of a proof. 
A formula A is provable when it is provable without hypothesis, i.e. when the 
sequent | A is provable. 


Example 2.2.5.1. The formula (AA B) > (AV B) is provable (for any formulas A 
and B): 


AABEAAB oe 
AABEA oy 
AABEAVB . 


FAABSAVB 
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Example 2.2.5.2. The formula (A V B) > (BV A) is provable: 
a a (ax) ema CT (ax) 
AVB,AFA.. AVB,BFB.., 
ple (V1) (V1) 
AVBFAVB AVB,AF BVA AVB,BFBVA (a) 
E 


AVBEBVA 
FAVBSBVA ey) 


Example 2.2.5.3. The formula A = ——A is provable: 


ae 
ASAP 
Ae 
FA>—77A 


(ax) 
(+8) 
(1) 
(=1) 


A,=AF A 


Example 2.2.5.4. The formula (A > B) > (=B 1A) is provable: 


1S Pspuciae 7Shapaee 


A> 5 2RALSB A= B,-B,AFB (ee 
AS BiB, APL 

ASB aBESA 
A=>BtaABsA7A 


F(AS>B)S33BS—7A 


Example 2.2.5.5. The formula (,A V B) => (A => B) is provable: 


“AVB,AnAPoA* = Av B,A sara ™ 
SAV B,A,7AF L (CE) 
SAVB,APoAve ‘™ SAV B,A,-AF B Ce) <aveaBre 
SAVB,AFB (Vp) 


(=) 


“AVBFA>B 
(=) 


Fk (AAV B) > (A=>B) 


Other typical provable formulas are 


— A and T satisfy the axioms of idempotent commutative monoids: 


(AA B)ACSAA(BAC) AABSBAA 
TAASASAAT A\NASA 


— V and L satisfy the axioms of idempotent commutative monoids 


A distributes over V and conversely: 


AN(BVC) &(AAB)V (AAC) 
AV (BAC) &(AVB)A(AVC) 


— => is reflexive and transitive 


ASA (ASB) Ss (BS 0)S (450) 
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— currying: 


(AA B) > C)@ (A= (B= C)) 


— usual reasoning structures with latin names, such as 


(A=> B)=>(-B A) (modus tollens) 
(AV B) = (“A= B) (modus tollendo ponens) 
(AA B) = (A=>-B) (modus ponendo tollens) 


Reasoning on proofs. In this formalism, the proofs are defined inductively and 
therefore we can reason by induction on them, which is often useful. Precisely, 
the induction principle on proofs is the following one: 

Theorem 2.2.5.6 (Induction on proofs). Suppose given a predicate P(7) on 
proofs 7. Suppose moreover that for every rule of figure 2.1 and every proof 7 
ending with this rule 


Ty Tn 
[Ty b Ay ie [T,F Ap 
TFA 


if P(m;) holds for every index i, with 1 < i <n, then P(z) also holds. Then 
P(m) holds for every proof 7. 


2.2.6 Fragments. A fragment of intuitionistic logic is a system obtained by 
restricting to formulas containing only certain connectives and the rules con- 
cerning these connectives. By convention, the axiom rule (ax) is present in 
every fragment. For instance, the implicational fragment of intuitionistic logic 
is obtained by restricting to implication: formulas are generated by the grammar 


A,B:=X|A=>B 


and the rules are 


TRASB THA T,AFB 


Co TRB Ge). fea oe 


ine mine y 


The cartesian fragment is obtained by restricting to product and implication. 
Another useful fragment is minimal logic obtained by considering formulas with- 
out L, and thus removing the rule (Lg). 


2.2.7 Admissible rules. A rule is admissible when, whenever the premises are 
provable, the conclusion is also provable. An important point here is that the 
way the proof of the conclusion is constructed might depend on the proofs of the 
premises, and not only on the fact that we know that the premises are provable. 


Structural rules. We begin by showing that the structural rules are admissible. 
Those rules are named in this way because they concern the structure of the 
logical proofs, as opposed to the particular connectives we are considering for 
formulas. They express some resource management possibilities for the hypothe- 
ses in sequents: we can permute, merge and weaken them, see section 2.2.10. 
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A first admissible rule is the weakening rule, which states that whenever 
one can prove a formula with some hypotheses, we can still prove it with more 
hypotheses. The proof with more hypotheses is “weaker” in the sense that it 
apply in less cases (since more hypotheses have to be satisfied). 

Proposition 2.2.7.1 (Weakening). The weakening rule 
rj’tB Ga 
Aree 
is admissible. 
Proof. By induction on the proof of the hypothesis T,I’ + B. 
— If the proof is of the form 


pers 


with B occurring in I or I’, then we conclude with 


(ax) 


PAres 
— If the proof is of the form 
TY 72 
TreBsc TIFB ' 
Pree - 
then we conclude with 
™ ™ 
PArtesasc TAPE B 
PAreC B) 
where m{ and 74 are respectively obtained from 7 and 72 by induction 
hypothesis: 
T1 72 
| -TPFBSC 2, TPB 
Ty = 7 (wk) {5 gore) 
TArreSo Pare 


— If the proof is of the from 
T 
rI’,BrEC 
rM’rsesc 


then we conclude with 


/ 
TT 


TAr,BrEC 
PAMEBSC 


I 


where 7’ is obtained from 7 by induction hypothesis. 
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— Other cases are similar. 


Also admissible is the exchange rule, which states that we can reorder hypothesis 
in the contexts: 


Proposition 2.2.7.2 (Exchange). The exchange rule 


TA BI’tC 


ls sd Deeg 
TBAreo 


is admissible. 


Proof. By induction on the proof of the hypothesis T, A,B, I’ + C. 


Given a proof 7 of some sequent, we often write w(z) for a proof obtained by 
weakening. Another admissible rule is contraction, which states that if we can 
prove a formula with two occurrences of a hypothesis, we can also prove it with 
one occurrence. 


Proposition 2.2.7.3 (Contraction). The contraction rule 
TA AI’EB 
TAItB 


(contr) 


is admissible. 


Proof. By induction on the proof of the hypothesis I, A, A,I’ F B. 


We can also formalize the fact that knowing T does not bring information, what 
we call here truth strengthening (we are not aware of a standard terminology 
for this one): 


Proposition 2.2.7.4 (Truth strengthening). The following rule is admissible: 


T,T,’bA 


————_ (tst 
hyena 


Proof. By induction on the proof of the hypothesis T, 7T,I’ + A, the only “sub- 
tle” case being that we have to transform 


(ax) 


ee eS a 
r,t,’ eT into pred 


Alternatively, the admissibility of the rule can also be deduced from the admis- 
sibility of the cut rule (see theorem 2.2.7.5 below). 


The cut rule. A most important admissible rule is the cut rule, which states 
that if we can prove a formula B using a hypothesis A (thought of as a lemma 
used in the proof) and we can prove the hypothesis A, then we can directly 
prove the formula B. 


Theorem 2.2.7.5 (Cut). The cut rule 


TFA TAIM+EB 
Tes 


(cut) 


is admissible. 
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Proof. For simplicity, we restrict ourselves to the case where the context I” 
is empty, which is not an important limitation because the exchange rule is 
admissible. The cut rule can be derived from the rules of implication by 


We will see in section 2.3.2 that the above proof is not satisfactory and will 
provide another one, which brings much more information about the dynamics 
of the proofs. 


Admissible rules via implication. Many rules can be proved to be admissible by 
eliminating provable implications: 
Lemma 2.2.7.6. Suppose that the formula A = B is provable. Then the rule 


TFA 
TEB 
is admissible. 
Proof. We have 
-FA=>B ‘ 
(oa Tease” : 
rEB : 


For instance, we have seen in example 2.2.5.4 that the implication 


(A= B) > (-B= 7A) 


is provable. We immediately deduce: 


Lemma 2.2.7.7 (Modus tollens). The following two variants of the modus tollens 
rule 


TFASB TFASB TE-=AB 
ThFABS-7A TFAA 


are admissible. 


2.2.8 Definable connectives. A logical connective is definable when it can be 
expressed from other connectives in such a way that replacing the connective by 
its expression and removing the associated logical rules preserves provability. 
Lemma 2.2.8.1. Negation is definable as >A = A= L. 


Proof. The introduction and elimination rules of = are derivable by 


Pepe ct: Rael 
pean? OY Fee ga 
Pew. PEA Peas ha 
TEL ray ap Peo B) 
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from which it follows that, given a provable formula A, the formula A’ ob- 
tained from A by changing all connectives =— into — => is provable, without 
using (>g) and (-;). Conversely, suppose given a formula A, such that the 
transformed formula A’ is provable. We have to show that A is also provable, 
which is more subtle. In the proof of A’, for each subproof of the form 


T 
TRFBSL 


where the conclusion B => | corresponds to the presence of —B as a subformula 
of A, we can transform the proof as follows: 


TT 
TRBSL 
ee eee ee 5 ——_—_ (ax) 
PBEBSL r,BEB 
ae ean ie 
TRB ey 


Applying this transformation enough times, we can transform the proof of A’ 
into a proof of A. A variant of this proof is given in corollary 2.2.9.2. O 


Lemma 2.2.8.2. Truth is definable as A = A, for any provable formula A not 
involving T. For instance: T = (L => L). 


Remark 2.2.8.3. In intuitionistic logic, contrarily to what we expect from the 
usual de Morgan formulas, the implication is not definable as 


A=B=-AVB 


see sections 2.3.5 and 2.5.1. 


2.2.9 Equivalence. We could have added to the syntax of our formulas an 
equivalence connective <= with associated rules 


TFASB TFASB TFASB TFBSA 
TFASB TFBSA TFASB 


It would have been definable as 
As B=(A=B)A(B=A) 


Two formulas A and B are equivalent when A © B is provable. This notion of 
equivalence relates in the expected way to provability: 


Lemma 2.2.9.1. If A and B are equivalent then, for every context [, 0 - A is 
provable if and only if [ + B is provable. 


Proof. Immediate application of lemma 2.2.7.6. 


In this way, we can give a variant of the proof of lemma 2.2.8.1: 
Corollary 2.2.9.2. Negation is definable as ~A = (A => L). 
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Proof. We have =A = (A => L): 


——— (ax) ———_ (ax) — (ax) (ax) 
=A, AF—A SA,AFA A>1,AFASL A> 1,AFA 
SA,AFL Ce) AS LAFL (5) 
“AFASL (=) ASlEAA Cp 
FrAS ASL Gp F(AS 1374 Gy 
FAAS (A= 1) (+E) 


and we conclude using lemma 2.2.9.1. 


2.2.10 Structural rules. The rules of exchange, contraction, weakening and 
truth strengthening are often called structural rules: 


DAB PO. x PAA EE 2 
PaAreo"™ Taree. 
lB BA a: 
TArre Trea 


We have seen in section 2.2.7 that they are admissible in our system. 


Contexts as sets. The rules of exchange and contraction allow to think of con- 
texts as sets (rather than lists) of formulas, because a set is a list “up to permu- 
tation and duplication of its elements”. More precisely, given a set A, we write 
P(A) for the set of subsets of A, and A* for the set of lists of elements of A. 
We define an equivalence relation ~ on A* as the smallest equivalence relation 
such that 


PAP ALT RAK AA Ana TAK 


Lemma 2.2.10.1. The function f : A* > P(A) which to a list associates its set 
of elements is surjective. Moreover, given T, A € A*, we have f(T) = f(A) if 
and only iff ~ A. 


We could therefore have directly defined contexts to be sets of formulas, as is 
sometimes done, but this would be really unsatisfactory. Namely, a formula A 
in a context can be thought of as some kind of hypothesis which is to be proved 
by an auxiliary lemma and we might have twice the same formula A, but proved 
by different means: in this case, we would like to be able to refer to a particular 
instance of A (which is proved in a particular way), and we cannot do this 
if we have a set of hypothesis. For instance, there are intuitively two proofs 
of A = A= A: the one which uses the left A to prove A and the one which 
uses the right one (this will become even more striking with the Curry-Howard 
correspondence, see remark 4.1.7.2). However, with contexts as sets, both are 
the same: 
(ax) 


(=1) 


(=1) 


AFA 

AFAS>A 

FASA=SA 

A less harmful simplification which is sometimes done is to quotient by exchange 

only (and not contraction), in which case the contexts become multisets, see 
appendix A.3.5. We will refrain from doing that here as well. 
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Variants of the proof system. The structural rules are usually taken as “real” 
(as opposed to admissible) rules of the proof system. Here, we have carefully 
chosen the formulation of rules, so that they are admissible, but it would not 
hold anymore if we had used subtle variants instead. For instance, if we replace 
the axiom rule by 


(ax) 


(ax) or ALA 


T,AFA 
or replace the introduction rule for conjunction by 


THA AFB 
T,AFAAB 


(A1) 


the structural rules are not all admissible anymore. The study of the fine struc- 
ture behind this lead Girard to introduce linear logic [Gir87]. 


2.2.11 Substitution. Given formulas A and B and a variable X, we write 
A[B/X] 


for the substitution of X by B in A, i.e. the formula A where all the occurrences 
of X have been replaced by B. More generally, a substitution for A is a function 
which to every variable X occurring in A assigns a formula o(X), and we also 
write 


Alo] 


for the formula A where every variable X has been replaced by o(X). Similarly, 
given a context [ = Aj,...,An, we define 


[lo] = Ajfo],..-, An[o] 


We often write 
[A,/Xq,...,An/Xn] 


for the substitution o such that o(X;) = A; and o(X) = X for X different from 
each X;. It satisfies 


A[Ai/Xi, tee »An/Xn] = A[Ai/X1] sas [An/Xn] 
We always suppose that, for a substitution o, the set 
{X €X | o(X) # X} 


is finite so that the substitution can be represented as the list of images of 
elements of this set. Provable formulas are closed under substitution: 


Proposition 2.2.11.1. Given a provable sequent [+ A and a substitution o, the 
sequent ['[a] + Alo] is also provable. 


Proof. By induction on the proof of PF A. 
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2.3 Cut elimination 


In mathematics, one often uses lemmas to show results. For instance, suppose 
that we want to show that 6 admits a half, i.e. there exists a number n such 
that n+ n= 6. We could proceed in this way by observing that 


— every even number admits a half, and 
— 6 is even. 


In this proof, we have used a lemma (even numbers can be halved) that we have 
supposed to be already proved. Of course, there was another, much shorter, 
proof of the fact that 6 admits a half: simply observe that 3+ 3 = 6. We should 
be able to extract the second proof (giving directly 3 as a half) from the first 
one, by looking in details at the proof of the lemma: this process of extracting 
a direct proof from a proof using a lemma is called cut elimination. We will 
see that is has a number of applications and will allow us to take a “dynamic” 
point of view on proofs: removing cuts corresponds to “executing” proofs. 

Let us illustrate how this process works in more details on the above example. 
We first need to make precise the notions we are using here, see section 6.6.3 
for a full formalization. We say that a number m is a half of a number n when 
m+m =n, and the set of even numbers is defined here to be the smallest set 
containing 0 and such that n + 2 is even when n is. Moreover, our lemma is 
proved in this way: 


Lemma 2.3.0.1. Every even number admits a half. 


Proof. Suppose given an even number n. By definition of evenness, it can be of 
the two following forms and we can reason by induction. 


— If n =0 then it admits 0 as half, since 0+ 0 = 0. 


—If n = n’ +2 with n’ even, then by induction n’ admits a half m, 
ie. m+m =n’, and therefore n admits m+ 1 as half since 


n=nv+2=(m+m)+2=(m4+1)+(m+1) 


In our reasoning to prove that 6 can be halved, we have used the fact that 6 is 
even, which we must have proved in this way: 


— 6 is even because 6 = 4+ 2 and 4 is even, where 


— 4 is even because 4 = 2+ 2 and 2 is even, where 


— 2 is even because 2 = 0 + 2 and 0 is even, where 
— 0 is even by definition. 


From the proof of the lemma, we know that the half of 6 is the successor of the 
half of 4, which is the successor of the half of 2 which is the successor of the half 
of 0, which is 0. Writing, as usual, n/2 for a half of n, we have 


6/2 = (4/2) +1 = (2/2) +14+.1= (0/2) +14141=0414141=3 


Therefore the half of 6 is 3: we have managed to extract the actual value of the 
half of 6 from the proofs the 6 is even and the above lemma. This example is 
further formalized in section 6.6.3. 
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2.3.1 Cuts. In logic, the use of a lemma to show a result is called a “cut”. This 
must not be confused with the (cut) rule presented in theorem 2.2.7.5, although 
they are closely related. Formally, a cut in a proof is an elimination rule whose 
principal premise is proved by an introduction rule of the same connective. For 
instance, the following are cuts: 


/ 


TT T TT 
TFA TFB APB a! 
Trane“ frase? Pea 
TFA oo TFB a 


The formula in the principal premise is called the cut formula: above, the cut 
formulas are respectively AA B and A => B. A proof containing a cut intuitively 
does “useless work”. Namely, the one on the left starts from a proof a of A in 
the context I, which it uses to prove A A B, from which it deduces A: in order 
to prove A, the proof 7 was already enough and the proof 7’ of B was entirely 
superfluous. Similarly, for the proof on the right, we show in 7 that supposing A 
we can prove B, and also in z’ that we can prove A: we could certainly directly 
prove B, replacing in z all the places where the hypothesis A is used (say by an 
axiom) by the proof x’. For this reason, cuts are sometimes also called detours. 

From a proof-theoretic point of view, it might seem a bit strange that some- 
one would use such a kind of proof structure, but this is actually common in 
mathematics: when we want to prove a result, we often prove a lemma which 
is more general than the result we want to show and then deduce the result we 
were aiming at. One of the reasons for proceeding in this way is that we can 
use the same lemma to cover multiple cases, and thus have shorter proofs (not 
to mention that they are generally more conceptual and modular, since we can 
reuse the lemmas for other proofs). We will see that, however, we can always 
avoid using cuts in order to prove formulas. Before doing so, we first need to 
introduce the main technical result which allows this. 


2.3.2 Proof substitution. A different kind of substitution than the one of 
section 2.2.11 consists in replacing some axioms in a proof by another proof. 
For instance, consider two proofs 


Paea Dana 
T,A,BE AAA (A) : : 
c= (+1) — 
T,AFBSAAA PEA 


The proof 7’ allows to deduce A from the hypothesis in I. Therefore, in the 
proof 7, each time the hypothesis A of the context is used (by an axiom rule), 
we can instead use the proof 7’ and reprove A. Doing so, the hypothesis A 
in the context becomes superfluous and we can remove it. The proof resulting 
from this transformation is thus obtained by “re-proving” A each time we need 
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it instead of having it as an hypothesis: 


/ / 


T T 
TFA ‘ TFA ‘ 
heeae” er a 
T,BEAAA (Ar) 
THBSAAA a 
This process generalizes as follows: 
Proposition 2.3.2.1. Given provable sequents 
T q w 
EaAres ~< Fea 


the sequent [,I’ + B is also provable, by a proof that we write as a[1’/A]: 
a(n’ /Al 
ree 

In other words, the (cut) rule 


TFA T,AI’EB 
TP’rs 


(cut) 


is admissible. 


Proof. By induction on 7. 


We will see that the admissibility of this rule is the main ingredient to prove 
cut elimination, thus its name. 


2.3.3 Cut elimination. A logic has the cut elimination property when when- 
ever a formula is provable then it is also provable with a proof which does not 
involve cuts: we can always avoid doing unnecessary things. This procedure was 
introduced by Gentzen under the name Hauptsatz [Gen35]. In general, we not 
only want to know that such a proof exists, but also to have an effective cut elim- 
ination procedure which transforms a proof into one without cuts. The reason 
for this is that we will see in section 4.1.8 that this corresponds to “executing” 
the proof (or the program corresponding to it): this is why Girard [Gir87] claims 
that 


A logic without cut elimination is like a car without an engine. 


Although the proof obtained after eliminating cuts is “simpler” in the sense that 
it does not contain unnecessary steps (cuts), it cannot always be considered as 
“better”: it is generally much bigger than the original one. The quote above 
explains it: think of a program computing the factorial of 1000. We see that a 
result can be much bigger than the program computing it [Boo84], and it can 
take much time to compute [Ore82]. 


Theorem 2.3.3.1. Intuitionistic natural deduction has the cut elimination prop- 
erty. 
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TT 
T,AFB mw 
TpASEe > TEA an’ /Al 
TFB a Ee re 
TT qn! 
Tea Pes 
[PaAB r 
TRA (“e) ~~ TPA 
TT qn! 
TFA FB 
[Pane a! 
TFB (“) ~~ TPB 
Tv 
TRA, rn! n 
Trave™ Parc Tere n'[r/A] 
TRO ve) SP? a 
Tv 
TFB . qn a” 
Trave™) Parc TBC ne" t/B] 
TFC ep OP Sagres 


Figure 2.2: Transforming proofs in NJ in order to eliminate cuts. 


Proof. Suppose given a proof which contains a cut. This means that at some 
point in the proof we encounter one of the following situations (i.e. we have a 
subproof of one of the following forms), in which case we transform the proof 
as indicated by ~» in figure 2.2 (we do not handle the cut on = since =A can be 
coded as A => L). For instance, 


(ax) (ax) 


T,AFA T,AFA 
T,AFAAA (Ar) - 
TEASAAA (9) TRA 
TEAAA 72) 
is transformed into 
Tv TT 
TEA eA 
fois 


We iterate the process on the resulting proof until all the cuts have been re- 
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moved. 

As it can be noticed on the above example, applying the transformation ~» 
might duplicate cuts: if the above proof 7 contained cuts, then the transformed 
proof contains twice the cuts of a. It is therefore not clear that the process 
actually terminates, whichever order we choose to eliminate cuts. We will see 
in section 4.2 that it indeed does, but the proof will be quite involved. It 
is sufficient for now to show that a particular strategy for eliminating cuts is 
terminating: at each step, we suppose that we eliminate a cut of highest depth, 
ie. there is no cut “closer to the axioms” (for instance, we could apply the above 
transformation only if 7 has not cuts). We define the size |A| of a formula A as 
its number of connectives and variables: 


|X| =|T| =|L}=1 |A=> B)=|AA B|=|AV BJ =14+|A|+4+|BI 


The degree of a cut is the size of the cut formula (e.g. of A= AAA in the above 
example, whose size is 2+ 3|A]), and the degree of a proof is then defined as the 
multiset (see appendix A.3.5) of the degrees of the cuts it contains. It can then 
be checked that whenever we apply ~», the newly created cuts are of strictly 
lower degree than the cut we eliminated and therefore the degree of the proof 
decreases according to the multiset order, see appendix A.3.5. For instance, if 
we apply a transformation 


T 
T,AFB nT 
cptee eA n(n! /A] 
TEB (ey ee 


we suppose that 7’ has no cuts (otherwise the eliminated cut would not be 
of highest depth). The degree of the cut is |A > B]. All the cuts present 
in the resulting proof where already present in the original proof, except for 
the new cuts on A which might be created by the substitution of 7’ in 7, 
which are of degree |A| < |A => B|. Since the multiset order is well-founded, 
see theorem A.3.5.1, the process will eventually come to an end: we cannot have 
an infinite sequence of ~» transformations, chosen according to our strategy. 


The previous theorem states that, as long as we are interested in provability, 
we can restrict ourselves to cut-free proofs. This is of interest because we often 
have a good idea of which rules can be used in those. In particular, we have the 
following useful result: 


Proposition 2.3.3.2. For any formula A, a cut-free proof of + A necessarily ends 
with an introduction rule. 


Proof. Consider the a cut-free proof a of Ff A. We reason by induction on it. 
This proof cannot be an axiom because the context is empty. Suppose that 7 
ends with an elimination rule: 


TT 


ee (?x) 


For each of the elimination rules, we observe that the principal premise is nec- 
essarily of the form + A’, and therefore ends with an introduction rule, by 
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induction hypothesis. The proof is then of the form 


(21) 
FA 


+ A’ . 
(25) 


and thus contains a cut, which is impossible since we have supposed z to be cut- 
free. Since 7 cannot end with an axiom nor an elimination rule, it necessarily 
ends with an introduction rule. 


In the above proposition, it is crucial that we consider a formula in an empty 
context: a cut-free proof of [ + A does not necessarily end with an introduction 
rule if [ is arbitrary. 


2.3.4 Consistency. The least one can expect from a non-trivial logical system 
is that not every formula is provable, otherwise the system is of no use. A logical 
system is consistent when there is at least one formula which cannot be proved 
in the system. Since, by (Lg), one can deduce any formula from L, we have: 


Lemma 2.3.4.1. The following are equivalent: 
(i) the logical system is consistent, 
(ii) the formula L cannot be proved, 


(iii) the principle of non-contradiction holds: there is no formula A such that 
both A and =A can be proved. 


Theorem 2.3.4.2. The system NJ is consistent. 


Proof. Suppose that it is inconsistent, i.e. by lemma 2.3.4.1 that it can prove L. 
By theorem 2.3.3.1, there is a cut-free proof of F L and, by proposition 2.3.3.2, 
this proof necessarily ends with an introduction rule. However, there is no 
introduction rule for 1, contradiction. 


Remark 2.3.4.3. As a side note, we would like to point out that if we naively 
allowed proofs to be infinite or cyclic (i.e. contain themselves as subproofs), then 
the system would not be consistent anymore. For instance, we could prove L 
by 

Sa te) 


ter L 7 


cise) JeE 
T= (+r) 


FL 


(this proof is infinite in the sense that we should replace a by the proof it- 
self above). Also, for such a proof, the cut elimination procedure would not 
terminate... 


CHAPTER 2. PROPOSITIONAL LOGIC 62 


2.3.5 Intuitionism. We have explained in the introduction that the intuition- 
istic point of view on proofs is that they should be “accessible to intuition” or 
“constructive”. This entails in particular that a proof of a disjunction AV B 
should imply that one of the two formulas A or B is provable: we not only know 
that the disjunction is true, but we can explicitly say which one of A or B is 
true. This property is satisfied by the system NJ we have defined above, and 
this explains why we have said that it is intuitionistic: 


Proposition 2.3.5.1. If a formula A V B is provable in NJ then either A or B is 
provable. 


Proof. Suppose that we have a proof of AV B. By theorem 2.3.3.1, we can 
suppose that this proof is cut-free and thus ends with an introduction rule by 
proposition 2.3.3.2. The proof is thus of one of the following two forms 


TT TT 
Aca ee 
Fave” Fave) 


which means that we either have a proof of A or a proof of B. 


While quite satisfactory, this property means that truth in our logical sys- 
tems behaves differently from the usual systems (e.g. validity in boolean models), 
which are called classical by contrast. Every formula provable in NJ is true in 
classical systems, but the converse is not true. One of the most striking example 
is the so-called principle of excluded middle stating that, for any formula A, the 
formula 

AAVA 


should hold. While this is certainly classically true, this cannot be proved 
intuitionistically for a general formula A: 


Lemma 2.3.5.2. Given a propositional variable X, the formula ~X V X cannot 
be proved in NJ. 


Proof. Suppose that it is provable. By proposition 2.3.5.1, either ~X or X is 
provable and by theorem 2.3.3.1 and proposition 2.3.3.2, we can assume that 
this proof is cut-free and ends with an introduction rule. Clearly, / X is not 
provable (because there is no corresponding introduction rule), so that we must 
have a cut-free proof of the form 


TT 
XFL 
F 3X 


(+1) 


By proposition 2.2.11.1, if we had such a proof, we would in particular have one 
where X is replaced by T: 


/ 
TT 


Tee 


but, by proposition 2.2.7.4, we could remove T from the hypothesis and obtain 
a proof 
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which is impossible by the consistency of NJ, see theorem 2.3.4.2. 


Of course, the above theorem does not state that, for a particular given 
formula A, the formula =A V A is not provable. For instance, with A = T, we 


have x 
oT (T1) 


Fatvt 


It however states that we cannot prove =A V A without knowing the details 
of A. This will be studied in more detail in section 2.5, where other examples 
of non-provable formulas are given. 

Since the excluded-middle is not provable, maybe it is false in our logic? That 
is not the case because we can show that the excluded-middle is not falsifiable 
either, since we can prove the formula —7(—A V A) as follows: 


a(-AV A),AFA 
a(4A V A), AF -(7A V A) a(nAV A), AF AAV A 
A(AAV A), AFL 
a(AAV A)F AA 
A(AAV A) FAAVA 
A(AAV AJ EL 
F 44(7A V A) 


(ax) (vi) 


SGAVA)PaAV Ay &*) 


This proof will be analyzed in more details in section 2.5.2. 
A variant of the above lemma which is sometimes useful is the following one: 


Lemma 2.3.5.3. Given a propositional variable X, the formula ~X V-=—X cannot 
be proved in NJ. 


Proof. Let us prove this in a slightly different way than in lemma 2.3.5.2. It 
can be proved in NJ that ~=T => L: 


(ax) (T1) 


ST FAT STET 
STE L fae 
Fats ' 
and that ~7AL => L: 
are 
SaLF L ne) 
FoanLSl oe 


Now, suppose that we have a proof of =X V=—X. By proposition 2.3.5.1, either 
=X or =X is provable. By proposition 2.2.11.1, either =T or =—_L is provable. 
In both cases, by (=), using the above proof, | is provable, which we know is 
not the case by consistency, see theorem 2.3.4.2 
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Pri 
PEAY el, 
T rr’ qr! 
PEAVB T,AEC PBEC 
(Vp) 
ies, : 
Pep (7) 
3 
rr’ qr” 
© PAPC a r,BEC “ie 
(x) (?s) 
TEAVB T,AED T,BED 
TrD (Ve) 


Figure 2.3: Elimination of commutative cuts. 


2.3.6 Commutative cuts. Are the cuts the only situations where one is doing 
useless work in proofs? No. It turns out that falsity and disjunction induce some 
more situations where we would like to eliminate “useless work”. For instance, 
consider the following proof: 


For the hypothesis |, we deduce the “general” statement that AV A holds, from 
which we deduce that A holds. Clearly, we ought to be able to simplify this 
proof by into 


where we directly prove A instead of using the “lemma” AV A as an intermediate 
step. Another example of such a situation is the following one: 


Se FRU peg () (ax) (ax) 
A,BVC,BFA A,BVO,BE A’ A,BVC,CFA A,BVO,CF AY 
ABvorsve A,BVC,BFANA (1) A,BVC,CF ANA 
A.BVCFAAA (Ve) 
(Az) 
A.BVCKA 


Here, in a context containing A, we prove A / A, from which we deduce A, 
whereas we could have directly proved A instead. This is almost a typical 
cut situation between the rule (Az) and (A,), except that we cannot eliminate 
the cut because the two rules are separated by the intermediate rule (Vg). In 
order for the system to have nice properties, we should thus add to the usual 
cut-elimination rules the rules of figure 2.3, where (?g) stands for an arbitrary 
elimination rule. Those rules eliminate what we call commutative cuts, see 


[Gir89, section 10.3]. 
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2.4 Proof search 


An important question is whether there is an automated procedure in order to 
perform proof-search in NJ, i.e. answer the question: 


Is a given sequent [+ A provable? 


In general, the answer is yes, but the complexity is hard. In order to do so, 
the basic idea of course consists in trying to construct a proof derivation whose 
conclusion is our sequent, from bottom up. 


2.4.1 Reversible rules. A rule is reversible when, if its conclusion is provable, 
then its hypothesis are provable. Such rules are particularly convenient in order 
to search for proofs since we know that we can always apply them: if the 
conclusion sequent was provable then the hypothesis still are. For instance, the 
rule 

TFA 


TFAVB 
is not reversible: if, while searching for a proof of [ ! AV B, we apply it, 
we might have to backtrack in the case where I. + A is not provable, since 
maybe IF B was provable instead, the most extreme example being 


(V1) 


FL 


—-—__=~—, vi 
PiviT 


where we have picked the wrong branch of the disjunction and try to prove L, 
whereas T was directly provable. On the contrary, the rule 
TFA [TFB 


TrAaB 


is reversible: during proof search, we can apply it without regretting our choice. 
Proposition 2.4.1.1. In NJ, the reversible rules are (ax), (=1), (Ar), (71) and 


(-1)- 
Proof. Consider the case of (= 1), the other cases being similar. In order to 
show that this rule (recalled on the left) is reversible, we have to show that if 
the conclusion is provable then the premise also is, i.e. that the rule on the right 
is admissible: 
T,AF B TFASB 
aa ee (1) pee 
TFASB T,AFB 
Suppose that we have a proof z of the conclusion [+ A = B. We can construct 
a proof of f, AF B by 
T 
TFASB 
a) ——— 
T,AFASB T,AFA 
T,AFB 
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For instance, we want to prove the formula X > Y > X AY. We can try 
to apply the reversible rules as long as we can, and indeed, we end up with a 
proof: 
—_— (ax) ———— (ax) 
X,YEX X,YFY 
A 
X,YEXAY wee 
XFYSXAY ; 
KX>Y=S3XAY oy 


2.4.2 Proof search. Proof search can be automated in NJ: there is an algo- 
rithm which, given a sequent, determines whether it is provable or not. We 
describe here such an algorithm where, for simplicity, we restrict ourselves here 
to the implicational fragment (formulas are built out of variables and implica- 
tion, and the rules are (ax), (=) and (=1)). 

Suppose that we are trying to determine whether a given sequent [ + A 
is provable. It can be observed that, depending on the formula A (which is 
either of the form B => C or a variable X), we can always look for proofs of the 
following form: 


(a) T+ B= C- the last rule is 


r,BEC 


TrBso 


and we look for a proof of f, BE C, 
(b) [+ X: the proof ends with 


oa cae 


Th Ay > 495... An > X : 
THA Ss SAS (78) Th Ag 5 
E 
PRA, >X a) TEA, 
TEX (ve) 


where the particular case n = 0 is 
rex 
and we thus try to find in the context a formula of the form 


Ay>...3A,5> X 


such that all the [+ A; are provable. 


Namely, the first case is justified by the fact that (=1) is reversible so that it 
can always be applied first, and the second one by the fact that we can look for 
cut-free proofs (theorem 2.3.3.1) so that we can restrict to the cases where the 
rules (=f) have a principal premise which is a rule (ax) or (+), but not (=1). 

This suggests the following procedure to determine the provability of a given 
sequent IF A: 
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— if Ais of the form B => C, we recursively try to prove the sequent [, BE C, 


— if Aisa variable X, we try to find in the context [a formula of the form 
Ay >... = An => X such that all the sequents I + A; are provable, 
which can be tested recursively. 


The problem with this procedure is that it might not terminate. For instance, 
given the sequent X => X | X, the procedure will loop, trying to construct an 
infinite proof tree of the form 


TRX 


In order to prevent this kind of loops, we should ensure that, whenever we are 
trying to construct a proof of a sequent + A, we never try to construct again 
a proof of [+ A at a later stage, and this is easily done by remembering all 
the sequents encountered during proof search. An actual implementation is 
provided in figure 2.4, where we use a list seen to remember the sequents that 
we have already seen, a sequent being encoded as a pair consisting of a context 
(a list of formulas) and a formula. 

Writing [ for the context X > Y,(X => Y) => X, our algorithm will find 
that + Y is provable because there is the proof 


(ax) aa ae es) 


PX-EXSY rXx-EX 
TXEY 8) 
> (ax) (1) 
PR(XSY)sX TRXSY 
(ax) (=8) 
TRPXSY TREX 


Prey 


Note that we are trying to prove Y twice during the proof search, but this is 
authorized because this is done in different contexts (respectively in the contexts 
T and T,X). As it can be observed in the above example, when looking for a 
proof a sequent ['F A, the contexts can grow during proof search. Termination 
is however still guaranteed because it can be shown that all the formulas that 
we add to the context are strict subformulas of the original formula A, and 
there are only a finite number of those. The algorithm can be shown to be in 
PSPACE (i.e. it requires an amount of memory which is polynomial in the size 
of its input) and the problem is actually PSPACE-complete (it is harder than 
any other problem in PSPACE, which in particular implies that it is harder 
than any problem in NP), see [Sta79] and [SU06, section 6.6]. Other methods 
for performing proof search in intuitionistic logic are presented in section 2.6.5. 


2.5 Classical logic 


As we have seen in section 2.3.5, not all the formulas that we expect to hold in 
logic are provable in intuitionistic logic, such as the excluded middle (lemma 2.3.5.2). 
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(** Formulas. *) 
type t = 
| Var of string 
| Imp of t x t 


(«x Split arguments and target of implications. *) 
let rec split_imp = function 
| Var x -> [], Var x 
| Imp (a, b) -> 
let args, tgt = split_imp b in 
a::args, tgt 


(*x Determine whether a sequent is provable in a given context. 


let rec provable seen env a = 

not (List.mem (env,a) seen) && 

let seen = (env,a)::seen in 

match a with 

| Var x -> 

List.exists (fun a -> 
let args, b = split_imp a in 
b = Var x && List.for_all (provable seen env) args 
) env 
| Imp (a, b) -> provable seen (a::env) b 


let provable = provable [] 


Figure 2.4: Deciding provability in intuitionistic logic. 


*) 


68 


CHAPTER 2. PROPOSITIONAL LOGIC 69 


In contrast, the usual notion of validity (e.g. coming from boolean models) is 
called classical logic. If classical logic is closer to the usual intuition of validity, 
the main drawback for us is that this logic is not constructive, in the sense that 
we cannot necessarily extract witnesses from proofs: if we have proved =A V A, 
we do not necessarily know which one of =A or A actually holds. 

A well-known typical classical reasoning is the following. We want to prove 
that there exist two irrational numbers a and b such that a? is rational. We 
know that V2 is irrational: if 2 = p/q then p? = 2q, but the number of prime 
factors is even on the left and odd on the right. Reasoning using the excluded 


middle, we know that the number fa? is either rational or irrational: 


— if it is rational, we conclude with a = b = V2, 


— otherwise, we take a = ya? and b = 2, and we have a? = 2 which 
concludes the proof. 


We have been able to prove the property, but we are not able to exhibit a 
concrete value for a and 0. 

From the proof-as-program correspondence, the excluded middle is also quite 
puzzling. Suppose that we are in a logic rich enough to encode Turing machines 
(or, equivalently, execute a program in a usual programming language) and that 
we have a predicate Halts(/) which holds when M is halting (you should find 
this quite plausible after having read chapter 6). In classical logic, the formula 


— Halts(M/) v Halts(M) 


holds for every Turing machine M, which seems to mean that we should be able 
to decide whether a Turing machine is halting or not, but there is no hope of 
finding such an algorithm since Turing has shown that the halting problem is 
undecidable [Tur37]. 


2.5.1 Axioms for classical logic. A logical system for classical logic, called 
NK (for Klassical Natural deduction), can be obtained from NJ (figure 2.1) by 
adding a new rule corresponding to the excluded middle 


—________ (lem) 
TRAAVA 


In this sense, the excluded middle is the only thing which is missing in intu- 
itionistic logic to be classical. This is shown in theorems 2.5.6.1 and 2.5.6.5. 

In fact, excluded middle is not the only possible choice, and other equiva- 
lent axioms can be added instead. Most of those axioms correspond to usual 
reasoning patterns, which have been known for a long time, and thus bear latin 
names. 


Theorem 2.5.1.1. The following principles are equivalent in NJ: 


(i) excluded middle, also called tertium non datur: 


AAVA 


(ii) double-negation elimination or reductio ad absurdum: 


AASA 
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(iii) contraposition: 


(3B > 7A) > (A= B) 


(iv) counter-example principle: 


(A= B) => AA-B 
(v) Peirce’s law: 
(A= B)=A)SA 
(vi) Clavius’ law or consequentia mirabilis: 
(AA=>A)SA 
(vii) Tarski’s formula: 
AV (A= B) 
(viii) one of the following de Morgan laws: 
«(-AA 7B) > AVB 
“(-AV=B) > AAB 


(ix) material implication: 


(A => B) => (-AV B) 
(x) =/V distributivity: 
(A> (BVC)) => (AS B)VC) 
By “equivalent” we mean here that if we suppose that one holds for every 


formulas A, B and C then the other one also holds for every formulas A, B 
and C, and conversely. 


Proof. We only show here the equivalence between the first two, the other ones 
being left as an exercise. Supposing that the excluded middle holds, we can 
show reductio ad absurdum by 


SASAraeA OO .. “sakes 
a eA =SA, AEA a Saaea ve 
=aAF A We) 
F-4AS>A a) 
(2.2) 


Supposing that reductio ad absurdum holds, we can show the excluded middle 
by 
T 
Fb a4(4A V A) > (AAV A) F a4(7A V A) 
FAAVA 


on) (2.3) 


where 7 is the proof of —4(—A V A) given on page 63. 
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Remark 2.5.1.2. One should be careful about the quantifications over formu- 
las involved in theorem 2.5.1.1. In order to illustrate this, let us detail the 
equivalence between excluded middle and reductio ad absurdum. We say that 
a formula A is decidable when —A V A holds and stable when —=A => A holds. 
The derivation (2.2) shows that every decidable formula is stable, but the con- 
verse does not hold: the derivation (2.3) only shows that A is decidable when 
=A \V A (as opposed to A) is stable. In fact a concrete example of a formula 
which is stable but not decidable can be given by taking A = 7X: the for- 
mula =77X = 7X holds (lemma 2.5.9.4), but =7X V =X cannot be proved 
(lemma 2.3.5.3). Thus, it is important to note that theorem 2.5.1.1 does not 
say that a formula is stable if and only if it is decidable, but rather that every 
formula is stable if and only if every formula is decidable. 


Among those axioms, Pierce’s law is less natural than others but has the ad- 
vantage of requiring only implication, so that it still makes sense in some small 
fragments of logic such as implicational logic. Also note that the fact that ma- 
terial implication occurs in this list means that A => B is not equivalent to 
AAV B in NJ, in contrast to NK. For each of these axioms, we could add more 
or less natural forms of rules. For instance, the law of the excluded middle can 
also be implemented by the nicer looking rule 


T,AaArFB T,AFB 
TRB 


(lem) 


similarly, reductio ad absurdum can be implemented by one of the following 
rules 


ey. (heey ese PSA A 
PessASA fea Yes rea rea 


Since classical logic is obtained from intuitionistic by adding axioms, it is 
obvious that 


Lemma 2.5.1.3. An intuitionistic proof is a valid classical proof. 


We have seen that the converse does not hold (lemma 2.3.5.2), but we will see 
in section 2.5.9 that we can still embed classical proofs in intuitionistic logic. 


2.5.2 The intuition behind classical logic. Let us try to give some proof 
theoretic intuition about how classical logic works. 


Proof irrelevance. We have already mentioned that we can interpret a formula 
as a set [A], intuitively corresponding to all the proofs of A, and implications 
as function spaces: [A > B] = [A] — [8B]. In this interpretation, L of course 
corresponds to the empty set since we do not expect to have a proof of it: 
[L] = 9. Now, given a formula A, its negation =A = (A > _) is interpreted as 
the set of functions from [A] to 0: 


— if [A] is non-empty, the set [=A] = [A] — 9 is empty, 


~ if [A] is empty, the set [-A] = 0 > @ contains exactly one element. 
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The last point might seem surprising, but if we think hard about it it makes 

sense. For instance, in set theory, a function f : [A] > [B] is usually defined as 

a relation f C [A] x [|B] which satisfies some properties, expressing that each 

element of [A] should have exactly one image in [B]. Now, when both the sets 

are empty, we are looking for a relation f C @ x @ and there is exactly one such 

relation: the empty set (which trivially satisfies the axioms for functions). 
Applying twice the reasoning above, we get that 


— if [A] is non-empty, [-—A] contains exactly one element, 
— if [A] is empty, [--A] is empty. 


In other words, =A can be seen as the formula A where the only thing which 
matters is not all the proofs of A (i.e. the elements of [A]), but only whether 
there exists a proof of A or not, since we have reduced its contents to at most 
one point. For this reason, doubly negated formulas are sometimes said to be 
proof irrelevant: again, the actual proof does not matter, only its existence. For 
instance, we now understand why 


is provable intuitionistically (see page 63): it states that it is true that there 
exists a proof of =A or a proof of A, as opposed to =A V A which states that we 
have a proof of =A or a proof of A. From this point of view, the classical axiom 


ASA 


now seems like deep magic: it means that if we know that there exists a proof 
of A, we can actually extract a proof of A. This can only be true if we assume 
that there can be at most one proof for a formula, i.e. formulas are interpreted 
as booleans and not sets (see section 2.5.4 for a logical point of view on this). 
This also explains why we can actually embed classical logic into intuitionistic 
logic by double-negating formulas, see section 2.5.9: if we are only interested in 
their existence, intuitionistic proofs behave classically. 


Resetting proofs. Let us give another, more operational, point of view on the 
axiom —7A = A. We have mentioned that it is equivalent to having the rule 


ThA 
TRA 


a7) 


so that when searching for a proof of A, we can instead prove —7~A. What do 
we gain in doing so? At first it does not seem much, since we can go back to 
proving A: 


Pb op emea 
ie eae 
Pes 


(>1) 


But there is one difference: we now have the additional hypothesis =A in our 
context, and we can use it at any point in the proof to go back to proving A 
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instead of the current goal B, while keeping the current context: 


Pindeoa  -PiaaraA 
ar aan 


I’,iwAFB 


(78) 
(Lr) 


In other words, we can “reset proofs” during proof search, i.e. we can implement 
the following behavior (up to minor details such as weakening): 


TFA 
Note that we keep the context I” after the reset. 

Now, let us show how we can use this to prove AV A. When faced with the 
disjunction, we choose the left branch, i.e. prove —A, which by (=) amounts 
to proving |, supposing A as hypothesis. Instead of going on and proving , 
which is quite hopeless, we use our reset mechanism and go back to proving 
AAV A: while doing so we have kept A as hypothesis! So, this time we chose to 
prove A, which can be done by an axiom. If we think of reset as the possibility 
of “going back in time” and changing one’s mind, this proof implements the 
following conversation between us, trying to build the proof, and an opponent 
trying to prove us wrong: 


— Show me the formula 7A V A. 

— Ok, I will show that =A holds. 

— Here is a proof 7 of A, show me how to deduce L. 

— Actually, I changed my mind, I will prove A, here is the proof: 7. 


The formal proof goes like this 


FAAVA 


In more details, the proof begins by proving -7~(-A V A) instead of ~AV A and 
then proceeds as in the proof given on page 63. This idea of resetting will be 
explored again, in a different form, in section 4.6. 
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(ax) 
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TAreaa 
TELASB,A THA,A T,AKB,A 
TEB,A oe) thAS oA ) 
PRAAB, A), TEAAB A, PrAA  TRBA 
fet TFB A (Az) TE AAB,A (A) 
Tera 
PrAVB,A T,AFCA  T,BECA TRA BA 
TECA (ve) TRAVB,A | ) 
TH 
Tela” 
ThaAA: PEAW TAG 
TEI,A is») PESAA Cy) 
structural rules: 
TEA,A,B,A’ PEA A! 
TPA,B,A A’ &R) TeAaw 
TE A,A, A, A’ Pe! 
TeA aa TeAwAr 


Figure 2.5: NK: rules of classical natural deduction. 


2.5.3 A variant of natural deduction. The presentation given in section 2.5.1 
is not very “canonical” in the sense that it amounts to randomly add an axiom 
from the list given in theorem 2.5.1.1. We would like to present another ap- 
proach which consists in slightly changing the calculus, and allows for a much 
more pleasant proof system. We now consider sequents of the form 


TFA 


where both [ and A are contexts. Such a sequent should be read as “supposing 
all the formula in I’, I can prove some formula in A”. This is a generalization of 
previous sequents, where A was restricted to exactly one formula. The rules for 
this sequent calculus are given in figure 2.5. In order to simplify the presentation, 
we consider here that the formulas of A can be explicitly permuted, duplicated, 
and so on, using the structural rules (xchr), (wkp), (contra) and (Lp), which 
we generally leave implicit in examples. Those rules are essentially the same as 
those for NJ, with contexts added on the right, except for the rules (V}) and 
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(Vj), which are now combined into the rule 


TEA,B,A 
—————  (V 

TEAVB,A ay 

In order to prove a disjunction AV B, we do not have to choose anymore if we 
want to prove A or B: we can try to prove both at the same time. This means 
that there can be some “exchange of information” between the proofs of A and 
of B (via the context [). For instance, we have that the excluded middle can 
be proved by 


Note that the formula A in the context, obtained from the —A, is used in the 
axiom in order to prove the other A. Similarly, double negation elimination is 
proved by 


(ax) 


sla 
AAPA 
Fae (+1) 
S5AF =5A,A aA AA,A 
SaAF LA Sy 
anAE A,A ( = 
ORE A (contrr) 
PASSA oe) 


Again, instead of proving A, we decide to either prove 1 or A. 
The expected elimination rule for the constant | (shown on the left) is not 
present, but it can be derived (as shown on the right): 


PRGA 
Sa 
Pei a ie ares 
Tra a rok 


In fact the constant is now superfluous, since one can convince himself that 
proving | amounts to proving the empty sequent A. 


2.5.4 Cut-elimination in classical logic. Classical logic also does have the 
cut-elimination property, see section 2.3.3, although this is more subtle to show 
than in the case of intuitionistic logic due to the presence of structural rules. In 
particular, in addition to the usual cut elimination steps, we need to add rules 
making elimination rules “commute” with structural rules: namely, an intro- 
duction and the corresponding elimination rules can be separated by structural 
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rules. For instance, suppose that we want to eliminate the following “cut”: 


Tv “is 
TrA TFB 
TrAaB. “? 
TFAAB,C (wkr) 
TrAC (Ap) 


We first need to make the elimination rule for conjunction commute with the 
weakening: 


TW T 
TPA. FER 
reAaB 
TRA (Ap) 
TFAC ovis) 


and then we can finally properly eliminate the cut: 


(Az) 


(wkr) 


T 
TFA 

TFA,C 

Another surprising phenomenon was observed by Lafont [Gir89, section B.1]. 


Depending on the order in which we eliminate cuts, the following proof 


/ 


Tv Tv 
TEA TFB 
rescue Teop 
Be As Ce) 
TEA,B (18) 
both cut-eliminates to 
Tv a 
TEA rigs) 
Tene) 8 “eae 


This is sometimes called Lafont’s critical pair. We like to identify proofs up to 
cut elimination (much more on this in chapter 4) and therefore those two proofs 
should be considered as being “the same”. In particular, when both 7 and x’ 
are proofs of [+ A, i.e. A = B, this forces us to identify the two proofs 


/ 


T T 
TFA TFA 
roan aye eoee 
ea (contrR) and a a (contrR) 


and thus to identify the two proofs 7 and 7’. More generally, by similar rea- 
soning, any two proofs of a same sequent I’ + A should be identified. Cuts can 
hurt! This gives another, purely logical, explanation of why classical logic is 
“proof irrelevant”, as already mentioned in section 2.5.2: up to cut-elimination, 
there is at most one proof of a given sequent. 
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2.5.5 De Morgan laws. In classical logic, the well-known de Morgan laws 
hold: 


“(AA B) 3 7AAVAB aT Sl A=>B<s-AVB 
“(AV B)3=7AAAA7AB see ST AASaA 


Definable connectives. Because of these laws, many connectives are superfluous. 
For instance, classical logic can be axiomatized with = and as the only 
connectives, since we can define 


AVB=7ASB AAB=-7(AS7B) 7A=AS1 TH=L1Sl 


and the logical system can be reduced to the following four rules: 


PPA of 
PArraa Tria 1) 

THASB,A THA,A PABA 
ree aA e TRASB,A*™ 


together with the four structural rules. Several other choices of connectives are 
possible. 


Clausal form. It is natural to consider the equivalence relation on formulas which 
identifies any two formulas A and B such that A © B. The de Morgan laws can 
be used to rewrite every formula into a canonical representative of its equivalence 
class induced by this equivalence relation. We first need to introduce some 
classes of formulas. 

A literal L is either a variable or a negated variable: 


Lu= X | aX 
A clause C is a disjunction of literals: 
Cr=L|Cve|L 


A formula A is in clausal form or in conjunctive normal form when it is a 
conjunction of clauses: 

Az=C|AAA|T 
Proposition 2.5.5.1. Every formula is equivalent to one in clausal form. 
One way to show this result is to use the de Morgan laws, as well as usual 
intuitionistic laws (section 2.2.5), in order to push negations toward variables 


and disjunctions below conjunctions, i.e. we replace subformulas according to 
the following rules, until no rule applies: 


(AA B) ~~ AAVAB aT L 

“(AV B)~ aAAA7AB aL T 
(AA B) VC» (AVC)A(BYVC) TVC~w» T 
AV(BAC) ~ (AV B)A(AVC) AVT~ 


A>Bw~-AVB aAW A 
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Those rules rewrite formulas into classically equivalent ones, since those are 
instances of de Morgan laws. However, it is not clear that the process terminates. 
It does, but it is not efficient, and we will see below a better way to rewrite a 
formula in clausal form. 


Example 2.5.5.2. A clausal form of (X = Y) > (Y = Z) can be computed by 
A(AX VY) V (AY V Z) ~~ (49 X ATY)V (AY V Z) 
~ (X AAY) V (AY V Z) 
as (X VAY VZ)A(AY VAY VZ) 


Efficient computation of the clausal form. Given a clause C, we write L(C) for 
the set of literals occurring in it: 


D(X) ={X} Linx) ={-X}) LIC VD)=L(C)UL(D) LiL) =90 


A variable X occurs positively (resp. negatively) in A if we have X € L(C) 
(resp. ~X € L(C)). Up to equivalence, formulas satisfy the laws of commutative 
idempotent monoids with respect to V and L: 


(AVB)VCSAV(BVC) LVASA BVASAVB 
AVLSA AVASA 


Because of this, a clause is characterized by the set of literals occurring in it, 
see appendix A.2: 


Lemma 2.5.5.3. Given clauses C' and D, if L(C) = L(D) then C © D. 


Similarly, a formula in clausal form is characterized by the set of clauses occur- 
ring in it. A formula in clausal form A can thus be encoded as a set of sets of 
literals: 


As le cee Voth EA ace Lae ye 


Note that the empty set @ corresponds to the formula T whereas the set {0} 
corresponds to the formula |. In practice, we can represent a formula as a list 
of lists of clauses (where the order or repetitions of the elements of the lists do 
not matter). Based on this, an algorithm for putting a formula in clausal form is 
provided in figure 2.6. A literal is encoded as a pair consisting of a variable and 
a boolean indicating whether it is negated or not (by convention, false means 
negated), a clause as a list of literals, and clausal form as a list of clauses. Given 
a formula A, the functions pos and neg compute the clausal form of A and —A, 
respectively. They are using the function merge, which, given two formulas in 
clausal form 


A= {Ci,...,Cm} and B={Dj,...,Dy} 
computes the clausal form of AV B, which is 


The notion of clausal form can be further improved as follows. We say that 
a formula is in canonical clausal form when 


1. it is in clausal form, 
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type var = int 


(*x Formulas. *) 
type t = 
| Var of var 
| And of tx t 
| Or of txt 
| Imp of t x t 
| Not of t 
| True | False 


type literal = bool * var («** Literal. *) (* false = negated *) 
type clause = literal list (** Clause. *) 
type cnf = clause list (** Clausal formula. *) 


let clausal a: cnf = 
let merge a b = 
List.flatten (List.map (fun c -> List.map (fun d -> c@d) b) a) 
in 
let rec pos = function 
Var x -> [[true, x]] 
And (a, b) -> let a = pos a in let b = pos b in a@b 
Or (a, b) -> let a = pos a in let b = pos b in merge a b 
Imp (a, b) -> let a = neg a in let b = pos b in merge a b 


Not a -> neg a 

True -> [] 

False -> [LI] 

Var x -> [L[false, x]] 


And (a, b) -> let a = neg a in let b = neg b in merge a b 
Or (a, b) -> let a = neg a in let b = neg b in a@b 


| 
| 
| 
| 
| 
| 
| 
and neg = function 
| 
| 
| 
| Imp (a, b) -> let a = pos a in let b = neg b in a@b 
| 
| 
| 


Not a -> pos a 
True -> COII 
False -> [] 

in 

pos a 


Figure 2.6: Rewriting a formula to a clausal form. 
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2. it does not contain twice the same clause or T (this is automatic if it is 
represented as a set of clauses), 


3. no clause contains twice the same literal or L (this is automatic if they 
are represented as sets of literals), 


4. no clause contains both a literal and its negation. 


For the last point, given a clause C' containing both X and —X, the equivalences 
aAXVX &T and TVA<ST imply that the whole clause is equivalent to T and 
can thus be removed from the formula. For instance, the clausal form computed 
in example 2.5.5.2 is not canonical because it does not satisfy the second point 
above. 


Exercise 2.5.5.4. Modify the algorithm of figure 2.6 so that it computes canonical 
clausal forms. 


De Morgan laws in intuitionistic logic. Let us insist once again on the fact that 
the de Morgan laws do not hold in intuitionistic logic. Namely, the following 
implications are intuitionistically true, but not their converse: 


AVB=-7(7AA-7B) AAV AB=>-(AA B) 
AANB=-(7AAV 7B) AAVB=A=B 


However, the following equivalence does hold intuitionistically: 


“AA -B & -(AV B) 


2.5.6 Boolean models. Classical natural deduction matches exactly the no- 
tion of truth one would get from usual boolean models. Let us detail this. We 
write B = {0,1} for the set of booleans. A valuation p is a function XY —> B, 
assigning booleans to variables. Such a valuation can be extended to a function 
p: Prop — B, from propositions to booleans, by induction over the propositions 
by 


p(X) =1 iff p(X) =1 
p(A => B) = 1 iff p(A) = 0 or p(B) =1 
P(A A B) = 1 iff p(A) = 1 and p(B) = 1 
p(T) =1 
P(AV B) = 1 iff p(A) = 1 or p(B) = 1 
AL) =0 
Given a formula A and a valuation p, we write F, A whenever p(A) = 1 


and say that the formula A is satisfied in the valuation p. Given a context 
T = Aj,...,An, we write TF, A whenever F, (Aj, Ai) > A. Finally, we 
write I F A whenever IF, A for every valuation p and, in this case, say that 
A is valid in the context T or that the sequent [+ A is valid. 

The system NK is correct in the sense that it only allows the derivation of 
valid sequents. 


Theorem 2.5.6.1 (Soundness). If [ F A is derivable then TF A. 
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Proof. By induction on the proof of TF A. 


Since NJ is a subsystem of NK, it thus also allows only the derivation of valid 
sequents. As simple as it may seem, the above theorem allows proving the 
consistency of intuitionistic and classical logic (which was already demonstrated 
in theorem 2.3.4.2 for intuitionistic logic): 


Corollary 2.5.6.2. The system NK (and thus also NJ) is consistent. 


Proof. Suppose that NK is not consistent. By lemma 2.3.4.1, we would have a 
proof of # L. By theorem 2.5.6.1, we would have p(L) = 1. But p(L) = 0 by 
definition, contradiction. 


Conversely, we can show that the system NK is complete, meaning that if a 
sequent [+ A is valid, i.e. we have [' F A, then it is derivable. As a particular 
case, we will have that if a formula A is valid then it is provable, i.e. F A is 
derivable. We first need the following lemmas. 


Lemma 2.5.6.3. For any formulas A and B, variable X and valuation p, we have 
p(A[B/X]) = p'(A), where p/(X) = p(B) and p'(Y) = p(¥) for X #Y. 


Proof. By induction on A. 
Lemma 2.5.6.4. For any formula A, the formula 

((X => A[T/X]) A (-X => A[L/X])) >A 
is derivable in NK. 
Proof. For conciseness, we write 

Ox A = (X => A[T/X]) A (7X = A[L/X)) 
We reason by induction on the formula A. If A= X then 


bxX =(X ST)A(AX => 1) 


and we have 


25 abe a Oo ee SA 


Oy X,7X FAX SL dy X,AX FAX 
Ox X,AXFL 
Ox X / a7X 
OxX FX 
FéoxX > X 


If A=Y with Y 4 X, we have 
OxY =(X SY)A(7X SY) 


and, using the fact that X V —X is derivable, 
dx, X F(X SY)AGXSY) “ 
dxY,XFXSY (Ap) : 

bxY EX VAX bxY,XFY E) OxY, aX FY 
5xY FY ( 


VR) 


Other cases are left to the reader. 
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Theorem 2.5.6.5 (Completeness). If [ F A holds then I’ A is derivable. 


Proof. We proceed by induction on the number of free variables of A. If 
FV(A) = 0 then we easily show that [ + A by induction on A. Otherwise, 
pick a variable X € FV(A). By lemma 2.5.6.3, the sequents T,X + A[T/X] 
and [T,=X + A[1/X] are valid, and thus derivable by induction hypothesis. 
Moreover, lemma 2.5.6.4 states that dx A = A is derivable. We thus have the 
derivation 


T,X-F AIT /X] T,aX FALL) 


rex Ss AT TPeox = Alyx i 
Teé.ASA Pei coy 
TFA (>) 


which allows us to conclude. 


A detailed and formalized version of this proof can be found in [CKA15]. 

Of course, intuitionistic natural deduction is not complete with respect to 
boolean models since there are formulas, such as ~X V X, which evaluate to 
true under any valuation but are not derivable (lemma 2.3.5.2). One way to 
understand this is that there are “not enough boolean models” in order to detect 
that such formulas are not valid. A natural question is thus: is there a way to 
generalize the notion of boolean model, so that intuitionistic natural deduction 
is complete with respect to this generalized notion of model, i.e. a formula which 
is valid in any such a model is necessarily intuitionistically provable? We will 
see in section 2.8 that such a notion of model exists: Kripke models. 


2.5.7 DPLL. As an aside, we would like to present the usual algorithm to 
decide the satisfiability of boolean formulas, which is based on the previous 
observations. A propositional formula A is satisfiable when there exists a val- 
uation p making it true, ie. such that F, A. An efficient way to test whether 
this is the case or not is the DPLL algorithm, due to Davis, Putnam, Logemann 
and Loveland [DLL62]. The basic idea here is the one we have already seen 
in lemma 2.5.6.4: if the formula A is satisfiable by a valuation p then, given a 
variable X occurring in A, we have either p(X) = 0 or p(X) = 1 and we can test 
whether A is satisfiable in both cases recursively since this makes the number 
of variables decrease in the formula (we call this splitting on the variable X): 


Lemma 2.5.7.1. Given a variable X, a formula A is satisfiable if and only if the 
formula A[L/X] or A[T /X] is satisfiable. 


Proof. If A is satisfiable, then there is a valuation p such that p(A) = 1. If 
p(X) = 0 (resp. p(X) = 1) then, by lemma 2.5.6.3, p(A[L/X]) = p(A) = 1 
(resp. p(A[T/X]) = p(A) = 1) and therefore A[L/X] (resp. A[T/X]) is sat- 
isfiable. Conversely, if A[L/X] (resp. A[T/X]) is satisfiable by a valuation p 
then, writing p’ for the valuation such that p'(X) = 0 (resp. p’(X) = 1) and 
p'(Y) = p(Y) for Y 4 X, by lemma 2.5.6.3 we have p(A[L/X]) = p’(A) = 1 
(resp. p(A[T/X]) = 3'(A) = 1). 
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In the base case, the formula A has no variable and it thus evaluates to the 
same value in any environment, and we can easily compute this value: it is 
satisfiable if and only if this value is true. This directly leads to a very simple 
implementation of a satisfiability algorithm, see figure 2.7: the function subst 
computes the substitution of a formula into another one, the function var finds 
a free variable, and finally the function sat tests the satisfiability of a formula. 

As is, this algorithm is not very efficient: some subformulas get evaluated 
many times during the search. It can however be much improved by using 
formulas in canonical clausal form, as described in proposition 2.5.5.1. First, 
substitution can be implemented on those as follows: 


Lemma 2.5.7.2. Given a canonical clausal formula A and a variable X, a canon- 
ical clausal formula for A[T /X] (resp. A[-L/X]) can be obtained from A by 


— removing all clauses containing X (resp. 4X), 
— removing 4X (resp. X) from all remaining clauses. 


The computation can be further improved by carefully choosing the variables 
we are going to split on first. A unitary clause in a clausal formula A is a 
clause containing exactly one literal L. If L is X (resp. —X) then, if we split 
on X, the branch A[L/X] (resp. A[T /X]) will fail. Therefore, 


Lemma 2.5.7.3. Consider a clausal formula A containing a unitary clause which 
is a literal X (resp. —X). Then the formula A is satisfiable if and only if the 
formula A[T/X] (resp. A[-L/X]) is. 

A literal X (resp. =X) is pure in a clausal formula A if —X (resp. X) does not 
occur in any clause of A: the variable X always occurs with the same polarity 
(positive or negative) in the formula. 


Lemma 2.5.7.4. A clausal formula A containing a pure literal X (resp. 4X) is 
satisfiable if and only if the formula A[T/X] (resp. A[-L/X]) is satisfiable. 
Another way to state the above lemma is that the clauses containing the pure 
literal can be removed from the formula without changing its satisfiability. 

The DPLL algorithm exploits these optimizations in order to test the satis- 
fiability of formula A: 


1. it first tries to see if A is obviously satisfiable (if it is T) or unsatisfiable 
(if it contains the clause 1), 


2. otherwise it tries to find a unitary clause and apply lemma 2.5.7.3, 
3. otherwise it tries to find a pure clause and apply lemma 2.5.7.4, 
4. otherwise it splits on an arbitrary variable by lemma 2.5.7.1. 


For the last step, various heuristics have been proposed for choosing the split- 
ting variable such as MOM (a variable with Maximal number of Occurrences 
in the clauses of Minimum size) or Jeroslow-Wang (a variable with maximum 
J(X) = 2¢27!¢l where C ranges over clauses containing X and |C| is the 
number of literals), and so on. 

A concrete implementation is provided in figure 2.8. The function sub im- 
plements substitution as described in lemma 2.5.7.2, the function unit finds a 
unitary clause (or raises Not_found if there is none), the function pure finds a 
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(*x Formulas. *) 
type t = 
| Var of int 
| And of tx t 
| Or of txt 
| Not of t 
| True | False 


(** Substitute a variable by a formula in a formula. *) 
let rec subst x c = function 

| Var y -> if x = y then c else Var y 

| And (a, b) -> And (subst x c a, subst x c b) 
| Or (a, b) -> Or (subst x c a, subst x c b) 
| Not a -> Not (subst x c a) 
| True -> True | False -> False 


(** Find a free variable in a formula. *) 
let var a = 
let exception Found of int in 
let rec aux = function 
| Var x -> raise (Found x) 
| And (a, b) | Or (a, b) -> aux a; aux b 
| Not a -> aux a 
| True | False -> () 
in 
try aux a; raise Not_found 
with Found x -> x 


(*x Evaluate a closed formula. *) 
let rec eval = function 
| Var _ -> assert false 
| And (a, b) -> eval a && eval b 
| Or (a, b) -> eval a || eval b 
| Not a -> not (eval a) 
| True -> true | False -> false 


(*x Simple-minded satisfiability. *) 
let rec sat a = 


try 
let x = var ain 
sat (subst x True a) || sat (subst x False a) 


with Not_found -> eval a 


Figure 2.7: Naive implementation of the satisfiability algorithm. 
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int (** Variable. *) 

bool * var («x Literal. *) (* false means negated *) 
literal list (** Clause. *) 

clause list (** Clausal formula. *) 


type var 
type literal 
type clause 
type cnf 


(** Substitution alv/x]. *) 

let subst (a:cnf) (v:bool) (x:var) : cnf = 
let a = List.filter (fun c -> not (List.mem (v,x) c)) a in 
List.map (fun c -> List.filter (fun 1 -> 1 <> (not v, x)) c) a 


(«x Find a unitary clause. *) 

let rec unit : cnf -> literal = function 
| [n,x]::a -> n,x 
|) 2tsa -> unit a 
| 0 -> raise Not_found 


(«x Find a pure literal in a clausal formula. *) 
let pure (a: cnf) : literal = 
let rec clause vars = function 
| CI -> vars 
| (n,x)::¢ -> 
try 
match List.assoc x vars with 
| Some n' -> 
if n' =n then clause vars c else 
let vars = List.filter (fun (y,_) -> y <> x) vars in 
clause ((x,None)::vars) c 
| None -> clause vars c 
with Not_found -> clause ((x,Some n)::vars) c 


in 
let vars = List.fold_left clause [] a in 
let x, n = List.find (function (x,Some s) -> true | _ -> false) vars in 


Option.get n, x 


(«x DPLL procedure. *) 
let rec dpll a= 
if a = [] then true 
else if List.mem [] a then false 
else 
try let n,x = unit a in dpll (subst an x) 
with Not_found -> 
try let n,x = pure a in dpll (subst an x) 
with Not_found -> 
let x = snd (List.hd (List.hd a)) in 
dpll (subst a false x) || dpll (subst a true x) 


Figure 2.8: Implementation of the DPLL algorithm. 
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pure literal (or raises Not_found) and finally the function dpll implements the 
above algorithm. The function pure uses an auxiliary list vars of pairs X ,b 
where X is a variable and 6 is either Some true or Some false if the variable X 
occurs only positively or negatively, or None if it occurs both positively and 
negatively. 


2.5.8 Resolution. The resolution procedure is a generalization of the previous 
DPLL algorithm which was introduced by Davis and Putnam [DP60]. It is not 
the most efficient algorithm, but one of the main interesting points about it is 
that it generalizes well to first-order logic, see section 5.4.6. It stems from the 
following observation. 


Lemma 2.5.8.1 (Correctness). Suppose given two clauses of the form CV X 
and =X V D, containing a variable X and its negation =X, respectively. Then 
the formula C V D is a consequence of them. 


Proof. Given a valuation p such that p(C V X) = p(7X V D) = 1, 
— if p(X) = 1 then necessarily #(D) = 1 and thus p(C' V D) = 1, 


— if p(X) =0 then necessarily p(C) = 1 and thus p(C V D) = 1. 


From a logical point of view, this deduction corresponds to the following reso- 
lution rule: 


TRCVX TRAX VD 
TRFCVD 


In the following, we implicitly consider formulas up to commutativity of dis- 
junction, i.e. identify the formulas AV B and BV A, so that the above rule also 
applies to clauses containing X and its negation: 
TROVXVC, TRFD, VAX V De 
TEC, VCeV Dy V Do 


(res) 


(res) 


The previous lemma can be reformulated in classical logic as follows: 


Lemma 2.5.8.2. The resolution rule is admissible in classical natural deduction. 
Proof. We have 


TEOVX TRAX VD’ 
TEC'",X TEAX, D’ 
AR ae ah Rona Al aarp 
TtKC'’,X,D TtKC’,AXx,D 
BRC Dp: 

TEC’,D’ 


(wkr) 
(78) 
(Lr) 


where the rule 
TFAVB 


TELA,B 
is derivable by 
(ax) 


T,BEAB 
(Ve) 


Pave’. Paras 
TEAS 


(in other words, the rule (V1) is reversible). 
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Remark 2.5.8.3. If we recall that in classical logic implication can be defined as 
A= B=-—AV B, the resolution rule simply corresponds to the transitivity of 
implication: 
TF-C=>xX TFX=>D 
[TF-AC=D 


For simplicity, in the following a context I will be seen as a set of clauses (as 
opposed to a list of clauses, see section 2.2.10) and will also be interpreted as a 
clausal form (the conjunction of its clauses, see section 2.5.5). We will always 
implicitly suppose that it is canonical (see section 2.5.5): a clause cannot contain 
the same literal twice or a literal and its negation. Previous lemmas entail that 
we can prove the sequent [+ L using axiom and resolution rules only when 
T is not satisfiable: otherwise, | would also be satisfiable, which it is not by 
definition. We are going to show in theorem 2.5.8.7 that this observation admits 
a converse. 


Resolvent. Given clauses C V X and —~X V D, the clause C V D does not con- 
tain the variable X, which gives us the idea of using resolution to remove the 
variable X from a set of formulas by performing all the possible deductions we 
can. Suppose given a set T of clauses and X a variable. We write 


Ty ={C|Cvx eT} Tix ={D|7~xvDeT} 


and I” for the set of clauses in I which contain neither X nor —X. We supposed 
that the clauses are in canonical form, so that we have the following partition 
of T: 

T=I"w{Cvx |Celx}W¥{-xXvD|DeET_x} 


The resolvent [ \ X of I with respect to X is 
PUxeMMmovp (Cera vers} 


Remark 2.5.8.4. As defined above, the resolvent might contain clauses not in 
canonical form, even if C and D are. In order to keep this invariant, we should 
remove all clauses of the form CV D such that C’ contains a literal and D its 
negation, which we will implicitly do; in clauses, we should also remove duplicate 
literals. 


As indicated above, computing the resolvent reduces the number of free variables 
of T: 


Lemma 2.5.8.5. Given T in clausal form and a variable X, we have 
FV(D \ X) =FV(P)\{X} 


Its main interest lies in the fact that it preserves satisfiability: 
Lemma 2.5.8.6. Given a clausal form TI and a variable X, T is satisfiable if and 
only if [ \ X is satisfiable. 


Proof. The left-to-right implication follows from the correctness of the resolution 
rule (lemma 2.5.8.1). For the right-to-left implication, suppose that T \ X is 
satisfied under a valuation p. We are going to show that I is satisfied under 
either pg or p1, where p; is defined, for i = 0 or i = 1, by p,(X) = t and 
pi(Y) = p(Y) whenever Y 4 X. We distinguish two cases. 
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— Suppose that we have p(C’) = 1 for every clause C = C’V X inT'y. Then 
we can take i = 0. Namely, given a clause C € TP =I’ Wy wT Lx: 


— if C € I” then po(C) = p(C) = 1 because C' does not contain the 
literal X, 

— if C € Ty then C=C’ V X and po(C) = 1 because, by hypothesis, 
polC") = p(C’) = 1, 

— if C € Tx then C=C’ VAX and po(C) = 1 because po(7X) = 1 
since ~g(X) = 0 by definition of po. 


— Otherwise, there exists a clause C = C’ V X € Ty such that p(C’) = 0. 
Then we can take i = 1. Namely, given a clause DET =I’ WTy Tix: 


— if D EI” then pi(D) = p(D) = 1 because D does not contain the 
literal X, 
— if DET x then D= D'V X and p;(D) = 1 because pi(X) = 1, 


— if D € Tx then D = D’'V7X and p(C’V D’) = 1 by hypothesis, thus 
pi(D') = p(D’) = 1 because p(C’) = 0, thus p1(D) = p1(D’VX) =1. 


The previous lemma implies that resolution is refutation complete in the sense 
that it can always be used to show that a set of clauses cannot be satisfied (by 
whichever valuation): 


Theorem 2.5.8.7 (Refutation completeness). A set I of clauses is unsatisfiable 
if and only if [ - L can be proved using the axiom and resolution rules only. 


Proof. Writing FV(T) = {X1,...,Xn} for the free variables of I’, define the 
sequence of sets of clauses To<icn by To =F and Ti4, =T; \ Xi: 


— the clauses of 9 can be deduced from those of I using the axiom rule, 


— the clauses of [;,; can be deduced from those in [; using the resolution 
rule. 


Lemma 2.5.8.6 ensures that I’; is satisfiable if and only if D;1 is satisfiable, and 
thus, by induction, I is satisfiable if and only if I’, is satisfiable. Moreover, by 
lemma 2.5.8.5, we have FV(T,,) = 9, thus Tl, = 0 or P,, = {1}, and therefore [,, 
is unsatisfiable if and only if [’,, = {1}. Finally, T is unsatisfiable if and only if 
T, = {L}, ie. L can be deduced from T using axiom and resolution rules. 


Completeness. Resolution is not complete: given a context I, there are clauses 
that can be deduced which cannot using resolution only. For instance, from 
[= X we cannot deduce X V Y using resolution only. However, resolution can 
be used in order to decide whether a formula A is a consequence of a context IT, 
in the following way: 


Lemma 2.5.8.8. A formula A is a consequence of a context I’ if and only if 
TU {AA} is unsatisfiable. 


Proof. Given a clausal form T, we have T => A equivalent to ~7~([ => A) 
equivalent to =([. A 4A), ie. PU {7A} not satisfiable. 


CHAPTER 2. PROPOSITIONAL LOGIC 89 
This lemma is the usual way we use resolution. 
Example 2.5.8.9. We can show that given 

xX=>Y Y=>Z x 


we can deduce Z. Rewriting those in normal form and using the previous lemma, 
this amounts to showing that T consisting of 


AX VY AYVZ xX AZ 


is not satisfiable. Indeed, we have 


—___ (ax) ——___—_ (ax) 
TFAXVY TF-AYVZ 
(res) (ax) 
TRFAXVZ TEX 
(res) (ax) 
TKZ eA 
pea ee) 


Implementation. We implement clausal forms using lists as in section 2.5.7. Us- 
ing this representation, the resolvent of a clausal form I (written g) with respect 
to a variable X (written x) can be computed using the following function: 


let resolve x g = 
let gx = List.filter (List.mem (true ,x)) gin 


let gx = List.map (List.remove (true ,x)) gx in 
let gx' = List.filter (List.mem (false,x)) g in 
let gx' = List.map (List.remove (false,x)) gx' in 
let g' = List.filter (List.for_all (fun (_,y) -> y <> x)) g in 


let disjunction c d = 
let union c d = 
List. fold_left 
(fun d 1 -> if List.mem 1 d then d else 1::d) 
dc 
in 
if c = [] && d = [] then raise False 
else 
if List.exists (fun (n,x) -> List.mem (not n,x) d) c then None 
else Some (union c d) 
in 
g'@(List.filter_map_pairs disjunction gx gx') 


Here, g’ is I’ and gx isI’y and gx’ is [x and we return the resolvent computed 
following the definition 


T\xX =u{CvD|Celx,DeET_x} 
The function List.filter_map_pairs, which is of type 
('a -> 'b -> 'c option) -> 'a list -> 'b list -> 'c list 


takes a function and two lists as arguments, applies the functions to every pair 
of elements of one list and the other, and returns the list of results which are 
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not None. It is used to compute the clauses C V D in the definition of T \ X. 
The disjunction is computed by disjunction, with some subtleties. Firstly, as 
noted in remark 2.5.8.4, we should be careful in order to produce formulas in 
canonical form: 


— a disjunction CV D containing both a literal and its negation should not 
be added, 


— in a disjunction C V D, if a literal occurs twice (once in C and once D), 
we should only keep one instance. 


Secondly, since we want to detect as early as possible when | can be deduced, we 
raise an exception False when we find one. We can then see whether a clausal 
form [ is inconsistent by repeatedly eliminating free variables using resolution. 
We use the auxiliary function free_var in order to find a free variable (it raises 
Not_found if there is none), its implementation being left to the reader. By 
theorem 2.5.8.7, if [ is inconsistent then | will be produced during the process 
(in which case the exception False is raised); otherwise the free variables will 
be exhausted (in which case the exception Not_found is raised). This can thus 
be computed with the following function: 


let rec inconsistent g = 
try inconsistent (resolve (free_var g) g) 
with 
| False -> true 
| Not_found -> false 


We can then decide whether a clause is a consequence of a set of other clauses 
by applying lemma 2.5.8.8: 


let prove gc = 
inconsistent ((neg c)::g) 


As an application, we can prove example 2.5.8.9 with 


let O = 

let g=[ 
[false,@;true,1]; 
[false,1;true,2]; 
[true,@] 

J in 

let c = [true,2] in 

assert (prove g c) 


2.5.9 Double-negation translation. We have seen in section 2.3.5 that some 
formulas are not provable in intuitionistic logic, whereas they are valid in clas- 
sical logic, a typical example being excluded middle —A V A. But can we really 
prove less in intuitionistic logic than in classical logic? A starting observation 
is that, even though —A V A is not provable, its double negation =7(—=A V A) 
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becomes provable, as first seen on page 63: 


2G Aare 


SGiv Dy ARSGAVAS  SGAVALAESAVA 
(AV A), AFL (=e) 
—(-AV A)F =A ee 
(ax) (Vi) 


a(nAV A)F A(AAV A) A(nAV A)FAAVA 
A(nAV A)FL 


F a3(AAV A) 


(78) 
(+1) 


One of the main ingredients behind this proof is that having =(—A V A) as 
hypothesis in a context T allows to discard the current proof goal B and go 
back to proving =A V A: 


(ax) 


TRL 
[ThrB 


TF -(-AV A) TRAAVA 


(Le) 


How is this better than proving =A V A directly? The fact that, during the 
proof, we can reset our proof goal to =A V A! We thus start by proving ~AV A 
by proving =A, which requires proving | from A. At this point, we change our 
mind and start again the proof of =A V A, but this time we prove A, which we 
can because we gained this information from the previously “aborted” proof. 
A more detailed explanation of this kind of behavior was already developed 
in section 2.5.2. This actually generalizes to any formula, by a result due to 
Glivenko [Gli29]. Given a context T, we write =I for the context obtained 
from [T by double-negating every formula. 


Theorem 2.5.9.1 (Glivenko’s theorem). Given a context I and propositional 
formula A, the sequent TF A is provable in classical logic if and only if the 
sequent —-[ + ——A is provable in intuitionistic logic. 


This result allows us to relate the consistency of classical and intuitionistic 
logic in the following way. 
Theorem 2.5.9.2. Intuitionistic logic is consistent if and only if classical logic is 
consistent. 


Proof. Suppose that intuitionistic logic is inconsistent: there is an intuitionistic 
proof of L. This proof is also a valid classical proof and thus classical logic is 
inconsistent. Conversely, suppose that classical logic is inconsistent. There is 
a classical proof of L and thus, by theorem 2.5.9.1, an intuitionistic proof 7 of 
aL. However, the implication ~=L => holds intuitionistically: 


a 
5 Pan a 
sea alee 
SSF L oe) 


(=1) 
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We thus have an intuitionistic proof of _L: 
T 


FasL Slo FasL 
FL 


and intuitionistic logic is inconsistent. 


Remark 2.5.9.3. The theorem 2.5.9.1 does not generalize as is to first-order 
logic, but some other translations of classical formulas into intuitionistic logic 
do. The most brutal one is due to Kolmogorov and consists in adding —- 
in front of every subformula. Interestingly, it corresponds to the call-by-name 
continuation-passing style translation of functional programming languages. A 
more economical translation is due to Gédel, transforming a formula A into the 
formula A* defined by induction: 


X* =X 

(AA B)* = A* A B* (AV B)* = =(3A* A -B*) 
fi Rae pele 

(AS Bt Aree Be (A) ose 


Finally, one can wonder if, by adding four negations to a formula, we could 
gain even more proof power, but this is not the case: the process stabilizes after 
the first iteration. 


Lemma 2.5.9.4. For every natural number n > 0, we have a"t?A & A" A. 


Proof. The implication A > ——A is intuitionistically provable, as already shown 
in example 2.2.5.3, as well as the implication —-7~A > —A: 


Lace. Seen 
aA, A,AF L ia) 
Sa (-1) 
=A AoA “an AE Sa 
aa54, AFL 
aAAE AA 


F3457A => 7A 


We conclude by induction on n. 


In particular, -~==A © ——A, so that we gain nothing by performing the 
double-negation translation twice. 


2.5.10 Intermediate logics. Once again, classical logic is obtained by adding 
the excluded middle —AV A (or any of the equivalent axioms, see theorem 2.5.1.1) 
to intuitionistic logic, so that new formulas are provable. A natural question is: 
are there intermediate logics between intuitionistic and classical? This means: 
can we add axioms to intuitionistic logic so that we get strictly more than intu- 
itionistic logic, but strictly less than classical logic? In more details: are there 
families of formulas, which are provable classically but not intuitionistically, 
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such that, by adding those as axioms to intuitionistic logic we obtain a logic in 
which some classical formulas are not provable? 
The answer is yes. A typical such family of axioms is the weak excluded 
middle: 
a7AAVAA 


Namely, the formula ~~X V 7X is not provable in intuitionistic logic (see 
lemma 2.3.5.3), so that assuming the weak excluded middle (for every for- 
mula A) allows proving new formulas. However, the formula ~X V X does 
not follow from the weak excluded middle (example 2.8.1.4). There are many 
other possible families of axioms giving rise to intermediate logics such as 


— linearity (or Godel-Dummett) axiom [God32, Dum59]: 


(A=> B)V (B= A) 


— Kreisel-Putnam axiom [KP57]: 


(AA => (BVC)) => (AA => B)V (AAS C)) 


— Scott’s axiom [KP57]: 


((A7A = A) => (AV 7A)) = (AAA VA) 


— Smetanich’s axiom [WZO07]: 


(-B = A) = (A= B) = A) = A) 


— and many more [DMJ16]. 


Exercise 2.5.10.1. Show that the above linearity principle 
(A= B)V (B= A) 
is equivalent to the following global choice principle for disjunctions 


(A=>BVC)=>(A=SB)V(A=C) 


2.6 Sequent calculus 


Natural deduction is “natural” in the sense that it allows for a precise corre- 
spondence between logic and computation, see chapter 4. However, it has some 
flaws. From an aesthetic point of view, the rules for A and V are not entirely 
dual, contrarily to what one would expect: if they were the same, we could think 
of reducing the work during proofs or implementations by handling them in the 
same way. More annoyingly, proof search is quite difficult. Namely, suppose 
that we are trying to prove 
AVBE BVA 
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The proof cannot begin with an introduction rule because we have no hope of 
filling the dots: 


AVBFB AVBFA 
AVBEBVA:! AVBEBVA‘! 


This means that we have to use another rule such as (Vg) 


PLAYS. LArPC rete 
TEC 


(VE) 


which requires us to come up with a formula AV B which is not directly indicated 
in the conclusion TF C and it is not clear how to automatically generate such 
formulas. Starting in this way, the proof can be ended as in example 2.2.5.2. 

In order to overcome this problem, Gentzen has invented sequent calculus, 
which is another presentation of logic. In natural deduction, all rules operate on 
the formula on the right of F and there are introduction and elimination rules. 
In sequent calculus, there are only introduction rules, but those can operate 
either on formulas on the left or on the right of +. This results in a highly 
symmetrical calculus. 


2.6.1 Sequents. In sequent calculus, sequents are of the form 
TFA 


where [ and A are contexts: the intuition is that we have the conjunction 
of formulas in T as hypothesis, from which we can deduce the disjunction of 
formulas in A. 


2.6.2 Rules. In all the systems we consider, unless otherwise stated, we always 
suppose that we can permute, duplicate and erase formulas in context, i.e. that 
the structural rules of figure 2.9 are always present. The additional rules for 
sequent calculus are shown in figure 2.10 and the resulting system is called LK. 
In sequent calculus, as opposed to natural deduction, the symmetry between 
disjunction and conjunction has been restored: except for the axiom and cut, all 
rules come in a left and right flavor. Although the presentation is quite different, 
the provability power of this system is the same as the one for classical natural 
deduction presented in section 2.5: 


Theorem 2.6.2.1. A sequent IF A is provable in NK (figure 2.5) if and only if 
it is provable in LK (figure 2.10). 


Proof. The idea is that, by induction, we can translate a proof in NK into a 
proof in LK, and back. The introduction rules in NK correspond to right rules 
in LK, the axiom rules match in both systems, the cut rule is admissible in NK 
(the proof is similar to the one for NJ in proposition 2.3.2.1), as well as various 
structural rules (shown as in section 2.2.7), so that we only have to show that 
the elimination rules of NK are admissible in LK and the left rules of LK are 
admissible in NK. We only handle the case of conjunction here: 
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T,B,AI’FA 


TABIrraA Xt) 


T,A,A,P’EA 
TLAreA 


(contri) 


TIM’eta 


PAE 


rT,” FA 
Trea)” 


TE A,B,A,A’ 


TEA, 4B,A oO) 


TEA,A, A, A’ 
reAa nw’ 


(contrr) 


PRA 


FEMA 


TRA, 1, A’ 
aa 


Figure 2.9: Structural rules for sequent calculus 


TAKA I,BEA 
T,AVBEA 


(Vx) 


PAPA CH) 


TKAA I,BEA 
T,ASBEA in) 


TLAA 
Taodra 


PHAAS | DARA. 
TEA (et) 
TKAA TEB,A 


reAReA 


ee 
TETAS 
TEL A,B,A 


TRAvB,A ®) 


T,AF B,A 
TFAS>B,A 


R 


T,AFA 
Teoa Ao ™ 


Figure 2.10: LK: rules of classical sequent calculus. 
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— the rule (A;) is admissible in LK: 


PRAABA rAeeraA 
= a et A 
frase aa TAABEA AY) 
TFAA tent) 
and admissibility of (Aj) is similar, 
— the rule (AL) is admissible in NK: 
T,AAB,AFAAB A ™ ABFA : 
T,AABFAAB,A ™ T,AAB,ArB,A “®) TAAB,A BFA 
TAABEAA T,AAB,AFA teat} 


T,AABFA 


Other cases are similar. 


Remark 2.6.2.2. As noted in [Gir89, chapter 5], the correspondence between 
proofs in NK and LK is not bijective. For instance, the two proofs in LK 


To C(@x) oop (ax) (ax) ao (ax) 
ABFA A,BFB A,BEA A,BFB 
ABEAAB ste ABEAAB oe 
A,B,B’FAAB ee A,B, B’F AAB 
A, A,B, B'FAAB ee) A,A’,B,B'F ANB ie 
AA. BAB'FAAB (Ax) ANAB,B'FAAB vo) 
(Ax) (AL) 


ANA',BAB'FAAB ANA, BAB'EAAB 


get mapped to the same proof in NK. 


Mutiplicative presentation. An alternative presentation (called multiplicative pre- 
sentation) of the rules is given in figure 2.11: instead of supposing that we have 
the same context, we can “merge” the contexts of the premises in the conclusion. 


Single-sided presentation. By de Morgan laws, in classical logic we can suppose 
that only the variables are negated, and negated at most once, see section 2.5.5. 
For instance, —(X V =Y) is equivalent to =X AY, which satisfies this property. 
Given a formula A of this form, we write A* for a formula of this form equivalent 
to 4A, which can be defined by induction: 


X*=3X (AX)* =X 
(AA B)* = A* Vv BY (AV B)* = A* A B* 
Trak 1*=T 


We omit implication here, since it can be defined as A > B= A* V B. Now, it 
can be observed that proving a sequent of the form [, AF A is essentially the 
same as proving the sequent [ + A*, A, except that all the rules get replaced 
by their opposites: 
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TRAA I,AFA’ 
— (ax) ; ; (cut) 
AFA TI’rA,A 
T,A,BEA TEAA I“ B,A’ 
TAABrA TP rAaB Aa“) 
= 
TrTA | R) 
TAKA I, BEA’ . rrABA 
T,P,AVBEA,A’ Vo) TEAVB,A | R) 
Taira t) 
TKAA I,BEA’ TAL B,A 
Tr,r’j,As BAA’ (1) PTRASBA*” 
PEA,A PAFA 
ee ee (Cr) PeSAA Or) 


Figure 2.11: LK: rules of classical sequent calculus (multiplicative presentation). 


Lemma 2.6.2.3. A sequent [, At A is provable in LK if and only if [+ A*,A 
is. 


For instance the proof on the left below corresponds to the proof on the right: 


aoe ae Fae, oak 
5 ee aa ee Reet ge eal er 
VASX EL OC? poxvx YP) 


Because of this, we can restrict the system to sequents of the form + A, which 
are called single-sided. All the rules preserve single-sidedness except for the 
axiom rule, which is easily modified in order to satisfy this property. With 
some extra care, we can even come up with a presentation which does not 
require any structural rules (those are admissible): the resulting presentation of 
the calculus is given in figure 2.12. If we do not want to consider only formulas 
where only variables can be negated, then the de Morgan laws can be added as 
the following explicit rules: 


LKaAVAB,A FAAATB,A ELA ET,A LAA 
PAAR) A ESAVE)A FST A ESA Sasa n 


2.6.3 Intuitionistic rules. In order to obtain a sequent calculus adapted to 
intuitionistic logic, one should restrict the two-sided proof system to sequents of 
the form [IF A, i.e. those where the context on the right of F contains exactly 
one formula. We also have to take variants of rules such as (Vp), which would 
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(ax) 


(ax) 


FA, A*, A’, A, A” PA, wat AX AY 


bAAAS: PA An 


FAA’ ae 
LA,A,A’ EF A,B,A’ : L A, A,B, A’ ’ 
FA, AAB,A’ ) FA, AV B,A’ 
= LA, A’ " 
Paanar PALA 


Figure 2.12: LK: single-sided presentation. 


otherwise not maintain the invariant of having one formula on the right. With 
little more care, one can write rules which do not require adding structural rules 
(they are admissible): the resulting calculus is presented in figure 2.13. Note 
that in order for contraction to be admissible one has to keep A => B in the 
context of the left premise. Similarly to theorem 2.6.2.1, one shows: 


Theorem 2.6.3.1. A sequent [+ A is provable in NJ if and only if it is provable 
in LJ. 


2.6.4 Cut elimination. By a similar argument as in section 2.3.3, it can be 
shown: 


Theorem 2.6.4.1. A sequent + A (resp. ' F A) is provable in LK (resp. LJ) if 
and only if it admits a proof without using the (cut) rule. 


2.6.5 Proof search. From a proof-search point of view, sequent calculus is 
much more well-behaved than natural deduction since, with the exception of 
the cut rule, we do not have to come up with new formulas when searching for 
proofs: 


Proposition 2.6.5.1. LK has the subformula property: apart from the (cut) rule, 
all the formulas occurring in the premise of a rule are subformulas of the formulas 
occurring in the conclusion. 


Since, by theorem 2.6.4.1, we can look for proofs without cuts, this means that 
we never have to come up with a new formula during proof search! Moreover, 
there is no harm in applying a rule whenever it applies thanks to the following 
property: 

Proposition 2.6.5.2. In LK, all the rules are reversible. 


Implementation. We now implement proof search, which is most simple to do 
using the single-sided presentation, see figure 2.12. We describe formulas as 


type t = 
| Var of bool * string (* false means negated variable *) 
| Imp of t x t 
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— See t 
Parra & TEB a) 
LAB ee Aes THA TE Bie 
PAABreEC*™ TEAAB : 
- 
rors 
PArEO”  Reree THA TEKB 
V —————— (V Vi 
LAVRI ESC (Vt) Trave  Trave 
——— (Ll 
Toe u) 
PALEY EA TBI’ eo T,A-LB 
TASBIrec v) TeaepT 
PA PA inet oad 
Toarei eee ane 


Figure 2.13: Rules of intuitionistic sequent calculus (LJ). 


| And of tx t 
| Or of txt 
| True | False 


Using this representation, the negation of a formula can be computed with the 
function 


let rec neg = function 


Var (n, x) -> 


Var (not n, x) 


| Imp (a, b) -> And (a, neg b) 

| And (a, b) -> Or (neg a, neg b) 
| Or (a, b) -> And (neg a, neg b) 
| True -> False 

| False -> True 


Finally, the following function implements proof search in LK: 


let rec prove venv = function 


| LJ -> false 
| a::env -> 
match a with 


Var (n, x) -> 

List.mem (Var (not n, x)) venv || 

prove ((Var (n, x))::venv) env 

Imp (a, b) -> prove venv ((neg a)::b::env) 

And (a, b) -> prove venv (a::env) && prove venv (b::env) 
Or (a, b) -> prove venv (a::b::env) 
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| True -> true 
| False -> prove venv env 


Since we are considering single sided sequents here, those can be encoded as 
lists of terms. The above function takes as argument a sequent I’ = venv and 
the sequent T’ to be proved. It picks a formula A in IT and applies the rules of 
figure 2.12 on it, until A is split into a list of literals: once this is the case, those 
literals are put into the sequent I’ (of already handled formulas). Initially, the 
context I’ is empty, and we usually want to prove one formula A, so that we 
can define 


let prove a = prove [] [a] 


Proof search in intuitionistic logic. Proof search can be performed in LJ, but 
the situation is more subtle. First note that, similarly to the situation in LK 
(proposition 2.6.5.1), we have 


Proposition 2.6.5.3. LJ has the subformula property. 
As an immediate consequence, we deduce 


Theorem 2.6.5.4. We can decide whether a sequent [+ A is provable in LJ or 
not. 


Proof. There is only a finite number of subformulas of [ A. We can restrict 
to sequents where a formula occurs at most 3 times in the context [Girl1, 
section 4.2.2] and therefore there is a finite number of possible sequents formed 
with those subformulas. By testing all the possible rules, we can determine 
which of those are provable, and thus determine whether the initial sequent is 
provable. 


The previous theorem is constructive, but the resulting algorithm is quite inef- 
ficient. 

The problem of finding proofs is more delicate than for LK because not all 
the rules are reversible: (V!,), (V},) and (=) are not reversible. The rules (V1), 
(Vj) are easy to handle when performing proof search: when trying to prove a 
formula AV B, we either try to prove A or to prove B. The rule (= 1) 


TASBI’MFA T,BI’FC 
TASBIrrC 


L) 


is more difficult to handle. If we apply it naively, it can loop for the same 
reasons as in section 2.4.2: 


PABA v) Rees 
T,AS>BEA 


T,AS>BEA 


L) 


Although we can detect loops by looking at whether we encounter the same 
sequent twice during the proof search, this is quite impractical. Also, since the 
rule (=) is not reversible, the order in which we apply it during proof search 
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T.XxsSAI’FB 


(=x) 


T.BSsOrrasB PTCrD 
r(AS>B)scrt-D 


TrAsS(BSC),I"FD TASCO,BScI’FtD 
T,(AAB)SC,I"FD*” T,(AVB)SC,I"ED 


Vv 


RArEB PKB 
r,TSAI’FB r) rlsA res 


(+1) 


Figure 2.14: Left implication rules in LJT. 


is relevant, and we would like to minimize the number of times we have to 
backtrack. 

The logic LJT was introduced by Dyckoff in order to overcome this prob- 
lem [Dyc92]. It is obtained from LJ by replacing the (=z) rule with the six 
rules of figure 2.14, which allow proving sequents of the form 


TASBIr're 


depending on the form of A. 


Proposition 2.6.5.5. A sequent is provable in LJ if and only if it is provable 
in LJT. 


The main interest of this variant is that proof search is always terminating (thus 
the T in LJT). Moreover, the rules (=, ), (>v), (1) and (=) are reversible 
and can thus always be applied during proof search. Many variants of this idea 
have been explored, such as the SLJ calculus [GLW99]. 

A proof search procedure based on this sequent calculus can be implemented 
as follows. We describe terms as usual as 


type t = 

Var of string 
Imp of t * t 
And of t * t 
Or of txt 
True | False 


The procedure which determines whether a formula is provable is then shown 
in figure 2.15. This procedure takes as argument two contexts I’ and I (respec- 
tively called env’ and env) and a formula A. Initially, the context I’ is empty; 
it will be used to store the formulas of which have already been “processed”. 
The procedure first applies all the reversible right rules, then all the reversible 
left rules; a formula of I which does not give rise to a reversible left rule is put 
in I’. Once this is done, the procedure tries to apply the axiom rule, handles 
disjunctions by trying to apply either (V1) or (V{), and finally successively tries 
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let rec prove env' env a = 
match a with 
| True -> true 
| And (a, b) -> prove env' env a && prove env' env b 
| Imp (a, b) -> prove env' (a::env) b 
| _ -> match env with 
| b::env -> (match b with 
| And (b, c) -> prove env' (b::c::env) a 
| Or (b, c) -> prove env' (b::env) a && prove env' (c::env) a 
| True -> prove env' env a 
| False -> true 
| Imp (And (b, c), d) -> 
prove env' ((Imp (b, Imp (c,d)))::env) a 
| Imp (Or (b, c), d) -> 
prove env' ((Imp (b,d))::(Imp (c,d))::env) a 


| Imp (True , b) -> 
prove env' (b::env) a 

| Imp (False, b) -> 
prove env' env a 

| Var _ | Imp (Var _, _) | Imp (Imp (_,_),_) -> 
prove (b::env') env a 

) 
| [] -> 


match a with 
| Var _ when List.mem a env' -> true 
| Or (a, b) -> prove env' env a || prove env' env b 
| a -> 
List.exists 
(fun (b, env') -> 
match b with 
| Imp (Var x, b) when List.mem (Var x) env' -> 
prove env' [b] a 
| Imp (Imp (b, c), d) -> 
prove env' [Imp (c, d)] (Imp (b, c)) && prove env' [d] a 
| _ -> false 
) (context_formulas env') 


let prove a = prove [] [] a 


Figure 2.15: Proof search in LJT 
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all the possible applications of the non-reversible rules (=x) and (=-,). Here 
the function context_formulas returns, given a context I’, the list of all the 
pairs consisting of a formula A and a context I’,T” such that TP = I’, A,T”, 
i.e. the context [ where some formula A has been removed. 


2.7 Hilbert calculus 


The Hilbert calculus is another formalism, due to Hilbert [Hil22], which makes 
opposite “design choices” than previous formalisms (natural deduction and se- 
quent calculus): it has lots of axioms and very few logical rules. 


2.7.1 Proofs. In this formalism, sequents are of the form I. - A, with Ta 
context and A a formula, and are deduced according to the following two rules 
only: 


TFA=SB [TFA 
TFB 


(ax) (+5) 


PAF EA 


respectively called axiom and modus ponens. Of course, there is very little that 
we can deduce with only these two rules. The other necessary logical principles 
are added in the form of axiom schemes, which can be assumed at any time 
during the proofs. In the case of the implicational fragment (implication is the 
only connective with which the formulas are built), those are 


(Kk) AS B=A, 


(8S) (AS BSC)3 (A= B)S ARC. 


By “axiom schemes”, we mean that the above formulas can be assumed for any 
given formulas A, B and C. In other words, this amounts to adding the rules 


(S) 


(K) 
TFASBSA 


TRK(ASBSC)3(ASB)s3 ASC 


A sequent is provable when it is the conclusion of a proof built from the above 
rules, and a formula A is provable when the sequent | A is provable. 


Example 2.7.1.1. For any formula A, the formula A => A is provable: 


(K) 


E) 


F(A=>(B=SA)>A)S(ASBSASA 4° 
F(ASB>A)SA>A4A 
FASA 


PAs (Bis Als A 


Note the complexity compared to NJ or LJ. 


None of the rules modify the context [, so that people generally omit writing 
it. Also, traditionally, instead of using proof trees, a proof of A in the context [ 
is formalized instead as a finite sequence of formulas A;,...,A,, with A, = A 
such that either 


— A; belongs to [, or 


— A; is an instance of an axiom, or 
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— there are indices j,k <i such that Ay = A; > Aj, ie. A; can be deduced 
by 
Te A; 


B) 


This corresponds to describing the proof tree by some traversal of it. 


Example 2.7.1.2. The proof example 2.7.1.1 is generally written as follows: 
1. (AS (B= A)=> A4)3 (AS B= A)SAB=SADbDy(S) 


2. A> (B= A) = A by (K) 


3. (A=> B= A) > A= A by modus ponens on 1. and 2. 
4. A> B= Aby (K) 


5. A= A by modus ponens on 3. and 4. 


2.7.2 Other connectives. In the case where connectives other than implica- 
tion are considered, appropriate axioms should be added: 


conjunction: AABSA A=B=AAB 
ANB=>B 
truth: A=T 
disjunction: AVBS>(ASC)=S(BSCO)35C A=>AVB 
B=>AVB 
falsity: Ilsa 
negation: AASASL Axs1l=-7A 


It can be observed that the axioms are in correspondence with elimination 
and introduction rules in natural deduction (respectively left and right column 
above). The classical variants of the system can be obtained by further adding 
one of the axioms from theorem 2.5.1.1. 


2.7.3 Relationship with natural deduction. In order to show that proofs 
in Hilbert calculus correspond to proofs in natural deduction, we first need to 
study some of its properties. The usual structural rules are admissible in this 
system: 

Proposition 2.7.3.1. The rules of exchange, contraction, truth strengthening and 
weakening are admissible in Hilbert calculus: 


LAB’ eC T,A,AEKC r,TKA reC 
r,B, AIRC r,AKC TRA PAC 


Proof. By induction on the proof of the premise. 
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The introduction rule for implication is also admissible. This is sometimes called 
the deduction theorem and is due to Herbrand. 


Proposition 2.7.3.2. The introduction rule for implication is admissible: 
T,A,I’+ B 
rI’+AsB 


I 


Proof. By induction on the proof of [, A,’ ' B. 
— If it is of the form 
(ax) 


CAreA 


then we can show + A => A by example 2.7.1.1 and thus T,I’ + A= A 
by weakening. 


— If it is of the form 


TAres 
with B different from A which belongs to T or I’, then we can show 
BrA=Bby 
K 
BresAss™ Bre™ 
BFASB el 
and thus T,I’ + A => B by weakening. 
— If it is of the form 
r,A,’-+C LO 
E 


r,A,I’ + B 
then, by induction hypothesis, we have proofs of T,I’ + A = C and of 
T,I’+} A> C= B and the derivation 


TU bAScsB 
(+8) 


TP PASS Bs 4505458 ® 
rM+(AsSC)SAS>B 
T.PFASB 


rMFAsSCc 
(+8) 


allows us to conclude. 


We can thus show that provability in this system is the usual one. 


Theorem 2.7.3.3. A sequent [IF A is provable in Hilbert calculus if and only if 
it is provable in natural deduction. 


Proof. For simplicity, we restrict to the case of the implicational fragment. In 
order to show that a proof in the Hilbert calculus induces a proof in NJ, we 
should show that the rules (ax) and (=p) are admissible in NJ (this is the case 
by definition) and that the axioms (S) and (K) can be derived in NJ, which is 
easy: 


: = (ax) : (ax) : (ax) 7 tas) 
AS BSCASBAFAS BSC ASBSC,ASBAFA ASBSCASB,AFASB ASBSCASB AFA 
AS>B>C,ASB,APFBSC =e) A>BSC,ASB,AFB ng 
AS BSCASB,AFC 
A>BSC,ASBFASC 
A>B=>Cr(ASB)>A5C 
F(ASBS0)3 (ASB) 3450 
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(this is deliberately too small to read, you should prove this by yourself) and 


ABrA 
Ee Ga 
FAS>Ba3A 


) 


1) 


Conversely, in order to show that a proof in NJ induces one in Hilbert cal- 
culus, we should show that the rules (ax), (=) and (=1) are admissible in 
Hilbert calculus: the first two are by definition, and the third one was proved 
in proposition 2.7.3.2. 


2.8 Kripke semantics 


We have seen in section 2.5.6 that the usual boolean interpretation of formulas is 
correct and complete, meaning that a formula is classically provable if and only if 
it is valid in every boolean model. One can wonder if there is an analogous notion 
of model for proofs in intuitionistic logic — and this is indeed the case: Kripke 
models are correct and complete for intuitionistic logic. They were discovered 
in the 1960s by Kripke [Kri65] and Joyal for modal logic, and can be thought 
of as a semantics of possible worlds evolving through time: as time progresses 
more propositions may become true. The moral is thus that intuitionistic logic 
is a logic where the notion of truth is “local”, unlike classical logic. 


2.8.1 Kripke structures. A Kripke structure (W,<, pe) consists of a partially 
ordered set (W,<) of worlds together with a valuation p: W x X¥ — B which 
indicates whether in a given world a given propositional variable is true or not. 
The valuation is always assumed to be monotonous, i.e. to satisfy p(w, X) = 1 
implies p(w’, X) = 1 for every worlds such that w < w’. We sometimes simply 
write W for a Kripke structure (W, <, p). 

Given a Kripke structure W and a world w € W, we write w Fw A (or 
simply w F A when W is clear from the context) when a formula A is satisfied 
in w. This relation is defined by induction on A: 


EX iff p(w, X) 

ET holds 

FL does not hold 

ifwkF AandwkF B 

FAVB ifwFE AorwEB 

E A => B iff, for every w’ > w, w’ F A implies w’ F B 
FAA iff, for every w’ > w, w’ F A does not hold 


eeeeeee 
T 
a 
> 
w 


A Kripke structure is often pictured as a graph whose vertices correspond to 
worlds, an edge from w to w’ indicating that w < w’, with the variables X 
such that p(w, X) = 1 being written next to the node w, see examples 2.8.1.4 
and 2.8.1.5. 

We can think of a Kripke structure as describing the evolution of a world 
through time: given two worlds such that w < w’, we think of w’ as being a 
possible future for w. Since the order is not necessarily total, a given world 
might have different possible futures. In each world, the valuation indicates 
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which formulas we know are true, and the monotonicity condition ensures that 
our knowledge can only grow: if we know that a formula is true then we will 
still know it in the future. 


Lemma 2.8.1.1. Satisfaction is monotonic: given a formula A, a Kripke structure 
W and a world w, if wF A then w’ F A for every world w’ > w. 


Proof. By induction on the formula A. 


Given a context [T = Aj,...,A,, a formula A, and a Kripke structure W, we 
write Fy A when, for every world w € W in which all the formulas A; are 
satisfied, the formula A is also satisfied. We write [ F A when I Fw A holds 
for every structure W: in this case, we say that A is valid in the context I. 


Remark 2.8.1.2. It should be observed that the notion of Kripke structure gen- 
eralizes the notion of boolean model recalled in section 2.5.6. Namely, a boolean 
valuation p: 4 — B can be seen as a Kripke structure W with a single world w, 
the valuation being given by p. The notion of validity for Kripke structures 
defined above then coincides with the one for boolean models. 


A following theorem ensures that Kripke semantics are sound: a provable 
formula is valid. 


Theorem 2.8.1.3 (Soundness). If a sequent [+ A is derivable in intuitionistic 
logic then TF A. 


Proof. By induction on the proof of TF A. 


The contrapositive of this theorem says that if we can find a Kripke structure 
in which there is a world where a formula A is not satisfied, then A is not 
intuitionistically provable. This thus provides an alternative to methods based 
on cut-elimination (see section 2.3.5) in order to establish the non-provability 
of formulas. 


Example 2.8.1.4. Consider the formula expressing double negation elimination 
a7X => X and the Kripke structure with W = {wo,w 1}, with wo < w,, and 
p(wo, X) = 0 and p(w,, X) = 1, which can be pictured as 


x 


WoO WI 


We have wo F =X (because there is the future world w, in which X holds) and 
w, F 7X, and thus wo F 77X (in fact, it can be shown that w F =7X in an 
arbitrary structure iff for every world w’ > w there exists a world w” > w’ such 
that w’” F X). Moreover, we have wo F X, and thus wo KF ~7X => X. This 
shows that -~X => X is not intuitionistically provable. In the same Kripke 
structure, we have wo KF ~X V X and thus the excluded middle —X V X is not 
intuitionistically provable either. 

Given an arbitrary formula A, by lemma 2.8.1.1, in this structure, this for- 
mula is either satisfied both in wo and wy, or only in w, or in no world: 


A A A 

. —————- . . ———- . . ——- 

wo Wi wo W1 WO W1 
In the two first cases, ——A is satisfied and in the last one —A is satisfied. 
Therefore, the weak excluded middle =A V —A is always satisfied: this shows 
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that the weak excluded middle does not imply the excluded middle. By using a 
similar reasoning, it can be shown that the linearity axiom (A > B)V(B= A) 
does not imply the excluded middle. Both thus give rise to intermediate logics, 
see section 2.5.10. 


Example 2.8.1.5. The Kripke structure 


x Y 

. <—_ . ———-. . 
shows that the linearity formula (X > Y) V (Y = X) is not intuitionistically 
provable (whereas classically it is). 


2.8.2 Completeness. We now consider the converse of the theorem 2.8.1.3. 
We will show that if a formula is valid then it is provable intuitionistically. Or 
equivalently, that if a formula is not provable, then we can always exhibit a 
Kripke structure in which it does not hold. 

Given a possibly infinite set ® of formulas, we write ® + A whenever there 
exists a finite subset [ C ® such that I} A is intuitionistically provable. Such 
a set is consistent if ®- L. By lemma 2.3.4.1, we have: 


Lemma 2.8.2.1. A set ® of formulas is consistent if and only if there is a for- 
mula A such that ®¥ A. 
A set ® of formulas is disjunctive if 6 | AV B implies ®F A or @F B. 


Lemma 2.8.2.2. Given a set ® of formulas and a formula A such that ® ¥ A, 
there exists a disjunctive set ® such that 6 C ® and OF A. 


Proof. Suppose fixed an enumeration of all the formulas of the form BV C 
occurring in ®. We construct by induction a sequence ®, of sets of formulas 
which are such that ®, K A. We set ®p = ©. Suppose ®, constructed and 
consider the n-th formula BV C in ®: 


— if ®,,BF A, we define ®,4, = ®, U {B}, 


— if ®,,Bt- A, we define ®,4, = ©, U{C}. 


In the first case, it is obvious that ®,14; - A. In the second one, we have 
®, | BV C and ®,, Bt A, if we also had ®,,C’ + A then, by (Vg), we would 
also have ®,, F A, which is excluded by induction hypothesis. Finally, we take 


d= Unen ®,,. 
A set ® of formulas is saturated if, for every formula A, ®+ A implies A € ®. 
Lemma 2.8.2.3. Given a set ® of formulas, the set 6 = {A | + A} is saturated. 


A set is complete if it is consistent, disjunctive and saturated. Combining the 
above lemmas we obtain, 


Lemma 2.8.2.4. Given a set ® of formulas and a formula A such that ® F A, 
there exists a complete set ® such that ® C © and A ¢ ©. 


Proof. By lemma 2.8.2.2, there exists a disjunctive set of formulas ® such that 
® C G and © ¥ A. This set of consistent by lemma 2.8.2.1. Moreover, by 
lemma 2.8.2.3, we can suppose that this set is saturated (the construction is 
easily checked to preserve consistency and disjunctiveness) and such that A ¢ ®. 
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The universal Kripke structure W is defined by 


W = {we | ® is complete} 


with we < we whenever ® C ®’, and p(wa, X) = 1 iff X € ©. 


Lemma 2.8.2.5. Let ® be a complete set and A a formula. Then we F A iff 
AeE®. 


Proof. By induction on the formula A. 


If A = X is a propositional variable, we have we F X iff p(we, X) = 1 iff 
XE®., 


If A = T, we always have we F T and we always have T € ® because 
®FT by (77) and @ is saturated. 


If A = L, we never have we F | and we never have | € ® because ® is 
consistent. 


IfA=BAC. 

Suppose that we F BAC. Then wa F B and we F C, and therefore 
®+ Band ®| C by induction hypothesis. We deduce ® + BAC by (Ay) 
and weakening and thus BAC’ € ® by saturation. 

Conversely, suppose BAC € ® andthus ® BAC. This entails ® F Band 
®+ C by (Al) and (Aj), and thus B € © and C € ® by saturation. By 
induction hypothesis, we have we F B and we F C, and thus we F BAC. 


IfA=BVC. 

Suppose that we F BV C. Then wa F B or we F C, and therefore ®| B 
or ®+ C by induction hypothesis. We deduce ®+ BV C by (VI) or (Vi) 
and thus BV C' € © by saturation. 

Conversely, suppose BV C' € ®. Since ® is disjunctive, we have B € ® or 
C € ®. By induction hypothesis, we have we F B or we F C, and thus 
weF BVC. 


IfA=BSC. 
Suppose that we F B > C. Our goal is to show B > C' € ®. By (=>) 
and saturation, it is enough to show ®, BI C.. Suppose that it is not the 
case. By lemma 2.8.2.4, we can construct a complete set ®’ with ® C @’, 
Be @® and C ¢ ©’. Since B € ©’, by induction hypothesis we have 
we F B. Therefore, since we F B > C and we < wa’, we have we EC, 
a contradiction. 

Conversely, suppose B = C € ©. Given ©’ such that we < we, 
ie. ® C O’, if we F B we have to show we F C. By induction hy- 
pothesis, we have B € ©’. Moreover, we have ®’ + B => C (because 
®+ B=>C and ® C ®’) and therefore ®’ | C' by (Su). By saturation 
we have C' € ®’ and thus ®’ F C, by induction hypothesis. 


Theorem 2.8.2.6 (Completeness). If [TF A then T+ A. 


Proof. Suppose TF A and [ ¥ A. By lemma 2.8.2.4, there exists a complete 
set of formulas ® such that [ C ® and ®F A. All the formulas of I are valid in 
we and thus we F A, because we have TF A. By lemma 2.8.2.5, we therefore 


have A € ®, a contradiction. 
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Remark 2.8.2.7. It can be shown that we can restrict to Kripke models which are 
tree-shaped and finite without losing completeness. With further restrictions, 
various completeness results have been obtained. As an extreme example, if 
we restrict to models with only one world, then we obtain boolean models 
(remark 2.8.1.2) which are complete for classical logic (theorem 2.5.6.5). For a 
more unexpected example, Kripke models which are total orders are complete 
for intuitionistic logic extended with the linearity axiom (section 2.5.10), thus 
its name. 


CHAPTER 3 


Pure )-calculus 


We now introduce the A-calculus, which is the functional core of a programming 
language: this is what you obtain when you remove everything from a functional 
programming language except for the variables, functions and application. In 
this language everything is thus a function. In the OCaml syntax, a typical 
A-term would thus be 


fun f x -> f (f x) (fun y -> y) 


Since A-calculus was actually invented before computers existed, the traditional 
notation is somewhat different from the above, and we write Ax.t instead of fun 
x -> t so that the above term would rather be written 


Afax.f(fx)(Ay-y) 


Bound variables. In a function, the name of the variable is not important, it 
could be replaced by any other name without changing the meaning of the 
function: we should consider Ax.7 and Ay.y as the same. In a term of the form 
Ax.t, we say that the abstraction \ binds the variable in the term t: the name of 
the variable x in ¢ is not really relevant, what matters is that this is the variable 
which was declared by this A. In mathematics, we are somewhat used to this in 
other situations than functions. For instance, in the first definition below, t is 
bound by the limit operator, in the second t is bound by the dt operator coming 
with the integral, and in the last one the summation sign is binding 2: 


f(x) = lim a f(x) = ta dt f(x) = Se 
0 ; 


This means that we can replace the name of the bound variable by any other 
(as above) without changing the meaning of the expression. For instance, the 
first one is equivalent to 

lim = 

ZOO Z 
This process of changing the name of the variable is called a-conversion and 
is more subtle than it seems at first: there are actually some restrictions on 
the names of the variables we can use. For instance, in the above example, we 
cannot rename t to x since the following reasoning is clearly not valid: 

0= lim dit, ie ea 
tooo t Zw 00 L w—00 

The problem here is that we tried to change the name of t to a variable name 
which was already used somewhere else. These issues are generally glossed over 
in mathematics, but in computer science we cannot simply do that: we have to 
understand in details these a-conversion mechanisms when implementing func- 
tional programming languages, otherwise we will evaluate programs incorrectly. 
Believe it or not this simple matter is a major source of bugs and headaches. 
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Evaluation. Another aspect we have to make precise is the notion of evaluation 
or reduction in a functional programming language. In mathematics, if f is the 
doubling function f(a) = «+2, then f(3) is 3+3, i.e. 6: with our \ notation, we 
have (Av.2+2x)3 = 6. In computer science, we want to see the way the program 
is executed and we will consider that (Ax.x + x)3 reduces to 3 + 3, which will 
itself reduce to 6, which is the result of our program. The general definition of 
the reduction in the language is given by the 6-reduction rule, which is 


(Ax.t)u —> tlu/a] 


It means that the function which to x associates some expression t, when applied 
to an argument u reduces to t where all the occurrences of x have been replaced 
by u. The properties of this reduction relation is one of our main objects of 
interest here. 


In this chapter. We introduce the A-calculus in section 3.1 and the 6-reduction 
in section 3.2. We then study the computational power of the resulting calculus 
in section 3.3 and show that reduction is confluent in section 3.4. We discuss 
the various ways in which reduction can be implemented in section 3.5, and the 
ways to handle a-conversion in section 3.6. 


References. Should you need a more detailed presentation of A-calculus, its prop- 
erties and applications, good introductions include [Bar84, SU06, Sel08). 


3.1 \-terms 


3.1.1 Definition. Suppose fixed an infinite countable set Y = {z,y,z,...} 
whose elements are called variables. The set A of A-terms is generated by the 
following grammar: 

thun=a|tul| drat 


This means that a A-term is either 
—a variable x, 


— an application t u, which is a pair of terms t and u, thought of as applying 
the function ¢ to an argument u, 


— an abstraction Xx.t, which is a pair consisting of a variable x and a term t, 
thought of as the function which to x associates t. 


For instance, we have the following \-terms 
ALL (Ax.(ax))(Ay.(yx)) Ax.(Ay.(a(Az-y))) 
By convention, 


— application is associative to the left, i.e. 
tuv = (tu)v 


and not t(uv), 
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— application binds more tightly than abstraction, i.e. 
Aw.cy = rx.(xy) 


and not (Az.x)y (in other words, abstraction extends as far as possible on 
the right), 


— we sometimes group abstractions, i.e. 


Axyz.vz(yz) is read as Aw. Ay.Az.22z(yz). 


3.1.2 Bound and free variables. In a term of the form Az.t, the variable x 
is said to be bound in the term ¢t: in a sense the abstraction “declares” the 
variable in the term ¢, and all occurrences of x in t will make reference to the 
variable declared here (unless it is bound again). Thus, in the term (Az.ay)a, the 
first occurrence of «x refers to the variable declared by the abstraction whereas 
the second does not. Intuitively, this term is the same as (Az.zy)x, but not 
as (Az.zy)z; this will be made formal below through the notion of a-equivalence, 
but we should keep in mind that there is always the possibility of renaming 
bound variables. 

A free variable in a term is a variable which is not bound in a subterm. We 
define the set FV(¢) of a term t, by induction on t, by 


FV(a) = {x} 
FV(tu) = FV(t) UFV(u) 
FV(Az.t) = FV(t) \ {x} 


A term t is closed when it has no free variable: FV(t) = 0. 


Example 3.1.2.1. The set of free variables of the term (Azv.ay)z is {y,z}. This 
term is thus not closed. The term Axy.x is closed. 


A variable x is fresh with respect to a term ¢t when it does not occur as a free 
variable in t, ie. x € & \ FV(t). Note that the set of variables of a term t is 
finite and the set of variable is infinite so that we can always find a fresh variable 
with respect to any term. 


3.1.3 Renaming and a-equivalence. In order to define a-equivalence, we 
first define the operation of renaming a variable x to y in a term t, and write 


tty /a} 


for the resulting term. There is one subtlety though, we only want to rename 
free occurrences of x, since the other ones refer to the abstraction to which they 
are bound. Formally, the renaming t{y/x} is defined by 


a{y/t}=y 
z{y/a} =z ifzAa 
(tu){y/a} = (t{y/x}) (uty/2}) 
(Aa.t){y/a} = Ax.t 
(Az.t){y/x} = Az.(t{y/x}) ifzAxvandzFy 
(Ay.t){y/a} = rAz.(t{z/y}{y/r}) for some z with z ¢ FV(t) U {z, y} 
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The three last lines handle the possible cases when renaming a variable in an 
abstraction: either we are trying to rename the bound variable, or the bound 
variable and variables involved in the renaming are distinct, or we are trying to 
rename a variable into the bound variable. 

The a-equivalence =, (or a-conversion) is the smallest congruence (see be- 
low) on terms which identifies terms differing only by renaming bound variables, 
ie. 

Att =o Ay.(t{y/r}) 


whenever y is not free in t. For instance, we have 


NE.L(AD.LY) 9, Az.z(AD.Ly) Ka Ay-y(Ax.ry) 


Formally, the fact that a-equivalence is a congruence means that it is the small- 
est relation such that whenever all the relations above the bar hold, the relation 
below the bar also holds: 


y € FV(t) 
Att =p Ay.(t{y/r}) 
ee t! U—~a yu! —~a t! 
tu—=, tu’ Aa.t ==, Az.t' 
t—, ta tsk 
t=, t t'=—,t t=, t” 


The equation on the first line is the one we have already seen above, those 
on the second line ensure that a-equivalence is compatible with application 
and abstraction, and those on the third line impose that a-equivalence is an 
equivalence relation (i.e. reflexive, symmetric and transitive). 


3.1.4 Substitution. Given A-terms ¢ and u and a variable x, we can define a 
new term 
tlu/z] 


which is the A-term obtained from t by replacing free occurrences of x by u. 
Again we have to properly take care of issues related to the fact that some 
variables are bound: 


— we only want to replace free occurrences of the variable x in t, since the 
bound ones refer to the corresponding abstractions in t and might be 
renamed, i.e. 


(a(Azy.x))[u/a] = u(Ary.x) but not u(Ary.w), 


— we do not want free variables in u to become accidentally bound by some 
abstraction in t, i.e. 


(Az.xy)[x/y] = (Az.zy)[a/y] = Az.2a but not Ax.aa. 
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Formally, the substitution t|u/x] is defined by induction on t by 


xlu/x] =u 

ylu/a] = y ify Ax 
(ti ta)[u/a] = (ti[u/a]) (ta{u/a}) 
(Aa.t)[u/a] = Ax.t 
(Ay.t)[u/a] = Ay.(t[u/z}) ify Aa and y ¢ FV(u) 
(Ay.t)[u/2] = ry" (t{y'/y}lu/a}) ify Aw, ye FV(u) 

and y’ ¢ FV(t) UFV(u) U {2}. 
Because of the last line, the result of the substitution is not well-defined, be- 


cause it depends on an arbitrary choice of a fresh variable y’, but one can show 
that this is a well-defined operation on A-terms up to a-equivalence. For this 
reason, as soon as we want to perform substitutions, it only makes sense to 
consider the set of A-terms quotiented by the a-equivalence relation: we will 
implicitly do so in the following, and implicitly ensure that all the constructions 
we perform are compatible with a-equivalence. The only time where we should 
take a-conversion seriously is when dealing with implementation matters, see 
section 3.6.2 for instance. Adopting this convention, the three last cases can be 
replaced by 
(Ay.t)[u/a] = Ay.(t[u/2]) 

where we suppose that y ¢ FV(t) U {a}, which we can always do up to a-con- 
version. 


3.2 6-reduction 


Consider a term of the form 
(Aa.t) u (3.1) 


It intuitively consists of a function expecting an argument x and returning a 
result t(2), which is given an argument u. We expect therefore the computation 
to reach the term t/u/a] consisting of the term ¢ where all the free occurrences 
of x have been replaced by u. This is what the notion of S-reduction does and 
we write 

(Awv.t) u —>g tlu/e] (3.2) 


to indicate that the term on the left reduces to the term on the right. Actually, 
we want to be able to also perform this kind of reduction within a term: we call 
a 8-redex in a term t, a subterm of the form (3.1) and the 6-reduction consists 
in preforming the replacement (3.2) in that term. 


3.2.1 Definition. Formally, the G-reduction is defined as the smallest binary 
relation —+g on terms such that 


t —B U 
(Ax.t)u —+g t[u/a] (Ps) Ax.t —>g dx.t' (Ba) 
t,t’ — 
-— (A) ——* — (,) 
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A “proof tree” showing that t —+g u is called a derivation of it. For instance, 
a derivation of Aw.(Ay.y)az —>g Ax.az is 


(3s) 


(Ay-y)z —8 & (1) 


(Ay.y)£xz —g xz 


Ax.(Ay.y)£2Z —>g AL.xzZ (Br) 


Such derivations are often useful to reason about G-reduction steps, by induction 
on the derivation tree. 


3.2.2 An example. For instance, we have the following sequence of (-reduc- 
tions, were each time we have underlined the (-redex: 
(Ax.y)((Az.zz)(At.t)) >, (Av-y)((At.t) (At.t)) 
—g (Ax.y) (Att) 


— BY 


3.2.3 Reduction and redexes. Let us now make some basic observations 
about how reductions interact with redexes. Reduction can create 3-redexes: 


(Av.xx)(Ay-y) — (Ay-y)(Ay-y) 


In the initial term there was only one redex, and after reducing it a new redex 
has appeared. Reductions can duplicate 6-redexes: 


(Acne) ((Ay-y)(Az.2)) 4p ((Ay-y)(Az.2)) (yy) Az-2)) 


The 6-redex (Ay.y)(Az.z) occurs once in the initial term and twice in the reduced 
one. Reduction can also erase (-redexes: 


(Aw-y)((Ay-y)(Az-2)) —¥a ¥ 


There were two redexes in the initial term, but there is none left after reducing 
one of them. 


3.2.4 Confluence. The reduction is not deterministic since some terms can 
reduce in multiple ways: 


Ay.y p< (Ary.y)((At.x)(Az.x)) —g (Aty.y)(Az.z) 


We thus have to be careful when studying properties of reduction: in particular, 
we always have to specify whether those properties hold for some reduction 
or every reduction. It can be noted that, although the two above reductions 
differ, they end up with the same term. For instance, the term on the right 
above reduces to Ay.y, which is the term on the left. This property is called 
“confluence”: eventually, the order in which we chose to perform (-reductions 
does not matter. This will be detailed and proved in section 3.4. 
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3.2.5 G-reduction paths. A reduction path 


t=to +g ty +g te rBives +8 tn-1 —ptpn =U 


from t to u is a finite sequence of terms to, t1,...,tn such that t; G-reduces to 
ti41 for every index 7, with tp) = t and t, = u. The natural number n is called 
the length of the reduction path. We write 


t—gu 


when there exists a reduction path from t to u as above, and say that t reduces 
in multiple steps to u. The relation —> g on terms is the reflexive and transitive 
closure of the relation —+,. 


3.2.6 Normalization. Some terms cannot reduce, they are called normal forms: 
x x(Ay.Az.y) 


Those can be though of as “values” or “results” for computations in A-calculus. 
Those terms are easily characterized: 


Proposition 3.2.6.1. The A-terms in normal form can be characterized induc- 
tively as the terms of the form 


Ax.t or 1 1 ee 


where the ¢; and t are normal forms. 


Proof. We reason by induction on the size of \-terms (the size being the number 
of abstractions and applications). Suppose given a A-term in normal form: by 
definition it can be of the following forms. 


— x: it is a normal form and it is a term generated of the expected form. 


— Ax.t: by the rule (6), this term is a normal form if and only if t is, i.e. by 
induction, if and only if t is itself of the expected form. 


— tu: by the rules (6)) and (,), if it is a normal form then both ¢ and wu are 
in normal form. By induction, ¢ must be of the form Az.t’ or rt, ...tn 
with ¢’ and t; of the expected form. The first case is impossible: other- 
wise, tu = (Az.t’)u would reduce by (85). Therefore, tu is of the form 
at,...t,u with t; and u in normal form. Conversely, any term of this 
form is a normal form. 


Having identified normal forms as the notion of “result” in the A-calculus, it 
is natural to study whether every term will eventually give rise to a result; we 
will see that this is not the case. A term t is weakly normalizing when it can 
reduce to a normal form, i.e. there exists a normal form u such that t —+, u. It 
is strongly normalizing when every sequence of reductions will eventually reduce 
to a normal form. In other words, there is no infinite sequence of reductions 
starting from t: 


t= to +g ty +g tg P Bowes 
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Not every term is strongly normalizing. For instance, the term 
Q = (Av.xx)(Ax.xx) 
reduces to itself and thus infinitely: 
(Av.vx)(Ax.xx) —>g (Av.vx)(Ax.xx) —>g (Av.vx)(Ax.xx) —>g... 
As a variant, the following term keeps growing during the reduction: 
(Av.xx)(Ay-yyy) 2 (Ay-yyy) Ay-yyy) 
— a Ay-yyy)Ay-yyy) Ay-yyy) 6 + 


Clearly, a strongly normalizing term is weakly normalizing, but the converse 
does not hold. For instance, the term 


(Az.y)((Av.x2x)(Axv.xx)) 


can reduce to y, which is a normal form, and is thus weakly normalizing. It can 
also reduce to itself and is thus not strongly normalizing. 


3.2.7 G-equivalence. We write —=g, for the 8-equivalence, which is the small- 
est equivalence relation containing —>+,. It is not difficult to show that this 
relation can be characterized as the symmetric closure of the relation —> gi we 
have 

—— pu 


whenever there exists terms to,...,tan such that 


t=tope ty “+ tog — t3 5 pe ton—1 “+¢ ton = U 


The notion of 6-equivalence is very natural on A-terms: it identifies two terms 
whenever they give rise to the same result. Two 6-equivalent terms are some- 
times also said to be 6-convertible. 


3.2.8 7-equivalence. In OCaml, the functions sin and fun x -> sin x are 
clearly “the same”: one can be used in place of another without changing any- 
thing, both will compute the sine of their input. However, they are not iden- 
tical: their syntax differ. In A-calculus, the 7-equivalence relation relates two 
such terms: it identifies a term t (which is a function, since everything is a 
function in A-calculus) with the function which to x associates tx. Formally, 
the 7-equivalence relation ==, is the smallest congruence such that 


t=, Ax.tx 


for every term t. 
By analogy with 6-reduction, it will sometimes be useful to consider the 
n-reduction relation which is the smallest congruence such that 


Ax.tL —>, t 
for every term t. The opposite relation 


t —, Axv.tx 
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is also useful and called 7-expansion. We have that 7-equivalence is the reflexive, 
symmetric and transitive closure of this relation. 

Finally, we write ==g,, for the 8n-equivalence relation, which is smallest 
equivalence relation containing both ==g and =, In this book, we will mostly 
focus on B-equivalence, although most proofs generalize to Gry-equivalence. 


3.3 Computing in the \-calculus 


The A-calculus contains only functions. Even though we have removed most of 
what is usually found in a programming language, we will see that it is far from 
trivial as a programming language. In order to do so, we will gradually show 
that usual programming constructions can be encoded in A-calculus. 


3.3.1 Identity. A first interesting term is the identity A-term 
l= Aux 
It has the property that, for any term t, we have 


It —+gt 


3.3.2 Booleans. The booleans true and false can respectively be encoded as 
T = Ary.x F= dcy.y 


With this encoding, the usual if-then-else conditional construction can be en- 
coded as 
if = Abay.bxy 


Namely, we have 
if Tiu—>gt if Ftu—rg u 
For instance, the first reduction is 


if Ttu = (Abry.bry)(Ary.x)tu —g 
— 7B 


Azy.(Axy.x) xy) tu 
Ay.(Acy.x)ty)u 
Axy.x)tu 

Ay.t)u 


— 7B 


at oe pee 


7B 
— et 


and the second one is similar. 
From there, the usual boolean operations of conjunction, disjunction and 
negation are easily defined by 


and = Ary.x yF or = Ary.xT y not = Axv.aF T 
For instance, one can check that we have 
andTT—+,T andTF—+gF andFT—+gF  andFF —+gF 


Above, we have defined conjunction (and other operations) from conditionals, 
which is quite classical. In OCaml, we would have written 
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let and x y = if x then y else false 
which translates in A-calculus as 
and — Acy.anday = Axy.if x yF —+p Ary. xy F 


and suggests the definition we made. There are of course other possible imple- 
mentations, e.g. 
and = Ary.xyx 


In the above implementations, we only guarantee that the expected reductions 
will happen when the arguments are booleans, but nothing is specified when 
the arguments are arbitrary \-terms. 


3.3.3 Pairs. The encoding of pairs can be deduced from booleans. Namely, we 
can encode the pairing operator as 
pair = Axyb.if bay 
When applied to two terms ¢ and u, it reduces to 
pair tu +, Ab.if btu 


which can be thought of as an encoding of the pair (¢,u). In order to recover 
the components of the pair, we can simply apply it to either T or F: 


(pairt u) T —+, t (pairtu) F +5 u 
We thus define the two projections as 
fst = Ap.pT snd = Ap.pF 
and we have, as expected 
fst (pairt uw) —>, t snd (pairt wu) —+g u 
More generally, n-uples can be encoded as 
uple” = \x,...%nb.b21 ... Ln 
with the associated projections 
proj,’ = Ap.p (Av... Ln; ) 
and one checks that 


proj” (uple” ty ... tn) —>, ti 


3.3.4 Natural numbers. Given \-terms f and x, anda natural number n € N, 
we write fx for the \-term f(f(...(fa))) with n occurrences of f: 


PLease |e A 
The n-th Church numeral is the A-term 
n=Afa.f™x =Afa.f(f(.--(fx))) 


In other words, the A-term n is such that, when applied to arguments f and 2, 
iterates n times the application of f to x. For low values of n, we have 


O=Afea L1=Afafe 2%=Afa.f(fe) 3=Afa.f(f(fz)) 
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Successor. The successor function can be encoded as 
succ = Anfu.f(nfx) 
which applies f to f” x. It behaves as expected since 
succn == (Anfa.f(nfx))(Afa.f"z) 
—p Afa. f((Afa.f"r) fx) 


sp Afa.F (Af? a)a) 
—p Afa.f(f"2) 


Another natural possible definition of successor would be 


succ = Anfau.nf (fx) 


Arithmetic functions. The addition, multiplication and exponentiation can sim- 
ilarly be defined as 


add = Amnfx.msuccn mul = Amnfu.m(addn)O exp = Amn.n(mulm) 1 
or, alternatively, as 
add = Amnfa.mf (nf) mul = Amn fx.m(nf)x exp = Amn.nm 


It can be checked that addition is such that, for every m,n € N, we have 
addmn = m+n: it computes the function which to x associates f applied m 
times to f applied n times to z, i.e. ft" x. And similarly for other operations. 


Comparisons. The test-if-zero function takes a natural number n as argument 
and returns the boolean true or false depending on whether n is 0 or not. It 
can be encoded as 

iszero = Anxy.n(Az.y)x 


Given n, x and y, it applies the function f = Az.y n times to a: if the function 
is applied 0 times then z is returned, otherwise if the function is applied at least 
once then y is returned. 

The predecessor function can also be encoded although it is more difficult 
(this is detailed below): 


pred = Anfa.n(Agh.h(gf)) (Ay.2)(Ay-y) 
It allows defining subtraction as 
sub = Amn.n pred m 


where, by convention, the result of m—n is 0 when m < n. From there, we can 
define comparisons of natural numbers such as the < relation since m < n is 
equivalent to m — n= 0: 


leq = Amn.iszero (sub mn) 
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Exercise 3.3.4.1. The Ackermann function [Ack28] from pairs of natural numbers 
to natural numbers is the function A defined by 


A(0,n)=n+1 
A(m + 1,0) = A(m, 1) 
A(m + 1,n +1) = A(m, A(m+1,n)) 


Show that, in A-calculus, it can be implemented as 


ack = Amn.m(Af.n f(f 1)) succ 


Predecessor. We are now going to see how we can implement the predecessor 
function mentioned above. Before going into that, let us see how we can imple- 
ment the Fibonacci sequence fy defined by fo = 0, fi = 1 and fr4i = fnt+fn-1.- 
A naive implementation would be 


let rec fib n = 
if n = @ then Q 
else if n = 1 then 1 
else fib (n-1) + fib (n-2) 


This function is highly inefficient because many computations are performed 
multiple times. For instance, to compute f,, we compute both f,-1 and fn—2a, 
but the computation of f,—1 will require computing another time f,-2, and so 
on. The usual strategy to improve that consists in computing two successive 
values (fn—1, fn) of the Fibonacci sequence at a time. Given such a pair, the 
next pair is computed by 


(has fn+1) = (as fn—1 =f fn) 
We thus define the function 


let fib_fun (q,p) = (p,ptq) 


which computes the next pair depending on the current pair. If we iterate n 
times this function on the pair (fo, f1) = (0,1), we obtain the pair (fn, fn4i) 
and we can thus obtain the n-th term of the Fibonacci sequence by projecting 
to the first element: 


let fib n = fst (iter n fib_fun (@,1)) 
where the function iter applies a function f n times to some element x: 


let rec iter nf x= 
if n = @ then x 
else f (iter (n-1) f x) 


Now, suppose that we want to implement the predecessor function on natural 
numbers without using subtraction. Given n € N, there is one value for which 
we obviously know the predecessor: the predecessor of n+ 1 is n. We will use 
this fact, and the above trick in order to remember the value for the previous 
predecessor, which is the n — 1 we are looking for! Let us write p, for the 
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predecessor of n. We can compute the pair (pp, Pn+1) of two successive values 
from the previous pair (pn—1, Pn) by 


(Pry Pri) = (Pn, Pn + 1) 
We thus define the function 
let pred_fun (q,p) = (p,pt1) 


If we iterate this function n times starting from the pair (po,pi) = (0,0), we 
obtain the pair (pn,Pn+1) and can thus compute p, as its first component: 


let pred n = fst (iter n pred_fun (0,0)) 
In A-calculus, this translates as 

pred = An.fst (n (Ax.pair (snd a) (succ (snd 2) )) (pair 00))) 
The formula for predecessor provided above is a variant of this one. 
3.3.5 Fixpoints. In order to define more elaborate functions on natural num- 
bers such as the factorial, we need to have the possibility of defining functions 
recursively. This can be achieved in A-calculus thanks to the so-called fixpoint 
combinators. In mathematics, a fixpoint of a function f is a value x such that 
f(x) =x. Note that such a value may or may not exist: for instance f = x 2? 


has 0 and 1 as fixpoints whereas f = x++ x +1 has no fixpoint. 
Similarly, in A-calculus a fixpoint for a term t is a term u such that 


tu —=—=B U 
A distinguishing feature of the A-calculus is that 
1. every term t admits a fixpoint, 


2. this fixpoint can be computed within A-calculus: there is a term Y such 
that Yt is a fixpoint of t: 


t(Yt) =, Yt 
A term Y as above is called a fixpoint combinator. 


Fizpoints in OCaml. Before giving a A-term which is fixpoint operator, let us see 
how it can be implemented in OCaml and used to program recursive functions. 
In practice, we will look for a function Y such that 


Yt 7B t (Yt) 


Note that such a function is necessarily non-terminating since there is an infinite 
sequence of reductions 


Yt — 7B t(Yt) — 7B tt(Y t) — 7B ttt (Yt) — 7B Sree 


but it might still be useful because since there might be other possible reductions 
reaching a normal form. Following the conventions, we will write fix instead of 
Y. A function which behaves as proposed is easily implemented: 
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let rec fix f = f (fix f) 


Let us see how this can be used in order to implement the factorial function 
without explicitly resorting to recursion. The factorial function satisfies 0! = 1 
and n! =n x (n—1)! so that it can be implemented as 


let rec fact n = 
if n = @ then 1 else n * fact (n-1) 


In order to implement it without using recursion, the trick is to first transform 
this function into one which takes, as first argument, a function f which is to 
be the factorial itself, and replace recursive calls by calls to this function: 


let fact_fun f n = 

if n = @ then 1 else n * f (n - 1) 
We then expect the factorial function to be obtained as its fixpoint: 
let fact = fix fact_fun 


Namely, this function will reduce to fact_fun (fix fact_fun), i.e. the above 
function where f was replaced by the function itself, as expected. However, if 
we try to define the function fact in this way, OCaml complains: 


Stack overflow during evaluation (looping recursion?). 


This is because OCaml always evaluates arguments first, so that it will fall into 
the infinite sequence of reductions mentioned above (the stack will grow at each 
recursive call and will exceed the maximal authorized value): 


fix fact_fun —+, fact_fun (fix fact_fun) —+,... 


The trick in order to avoid that in order to avoid that is to add an argument in 
the definition of fix: 


let rec fix f x = f (fix f) x 


and now the above definition of factorial computes as expected: this time, the 
argument fix f does not evaluate further because it is a function which is still 
expecting its second argument. It is interesting to note that the two definitions 
of fix (the looping one and the working one) are 7-equivalent, see section 3.2.8, 
so that two n-equivalent terms can act differently depending on the properties 
we consider. 


Fixpoints in A-calculus. The above definition of fix does not easily translate 
to A-calculus, because there is no simple way of defining recursive functions. A 
possible implementation of the fixpoint combinator can be obtained by a variant 
on the looping term Q (see section 3.2.6). The Curry fixpoint combinator is 


Y=Af.(Au.f(ax)) (Au. f (xx)) 
Namely, given a term t, we have 
Yt== (f.(Aa.f(ax)) (Av. f(xx))) t 
— (Aa.t(ax))(Av.t(2e)) 
—> t((Av.t(xx))(Ax.t(xx))) 
p<—t(Yt) 


CHAPTER 3. PURE \-CALCULUS 125 


which shows that we indeed have Yt ==, t(Yt), ie. Yt is a fixpoint of t. 
Another possible fixpoint combinator is Turing’s one defined as 


O= (Afz.a(ffx))(Afx.x(f fx) 
which satisfies, for any term f, 
Ot 4, t(Ot) 


(we have here a proper sequence of 3-reductions, as opposed to a mere 3-equivalence 
for Curry’s combinator). 
The OCam!] definition of the factorial 


let fact = fix (fun f n -> if n = @ then 1 else n * f (n - 1)) 
translates into A-calculus as 
fact = Y(A fn. if (iszero n) 1 (mul n (f (pred )))) 
For instance, the factorial of 2 computes as 
fact2 == (YF)2 
—p F(YF)2 
—+ 4 if (iszero 2) 1 (mul 2 ((Y F) (pred 2))) 
—+4 if false 1 (mul 2 ((Y F) (pred 2))) 
+5 mul 2 ((YF) (pred 2)) 
—+, mul2((YF) 1) 


8 mul 2 (mul 11) 
>, 2 


Remark 3.3.5.1. Following Church’s initial intuition when introducing the A-cal- 
culus, we can think of \-terms as describing sets, in the sense of set theory (see 
section 5.3). Namely, a set t can be thought of as a predicate, i.e. a function 
which takes an element u as argument and returns true or false depending on 
whether the element u belongs to ¢ or not. Following this point of view, instead 
of writing u € t, we write tu. Similarly, given a predicate t, the set {x | t} is 
naturally written Ax.t: 


set theory | A-calculus 
uct tu 
{x | t} Ax.t 


In this context, the paradoxical Russell set 
r= {x | “(x € x)} 
of naive set theory, see section 5.3.1, is written as 


r = r\x.7(x2) 
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This set has the property that r € r iff a=(r € r), ie. 
rr = -(rr) 


In other words rr is a fixpoint for =. Generalizing this to any f instead of -, 
we recover the definition of Y: 


Y=Af.rr 
with r = Ax. f(x). In this sense, the fixpoint combinator is the Russell paradox 
in disguise! 
Church’s combinator in OCaml. If we try to implement Church’s combinator in 
OCaml: 
let fix = fun f -> (fun x -> f (x x)) (fun x -> f (x x)) 


we get a typing error concerning the variable x. Namely, x is applied to some- 
thing in the above expression, so it should be of type ’a -> ’b, but its argument 
is x itself, which imposes that we should have ’a = ’a -> ’b. The type of x 
should thus be the infinite type 


. > ’b -> ’b -> ’b -> ’b 


which is not allowed by default. There are two ways around this. 
The first one consists in using the -rectypes option of OCaml in order allow 
such types. If we use this function to define the factorial by 


let fact = fix fact_fun 


we get a stack overflow, meaning that the program is looping, which can be 
solved with an 7-expansion (we have already seen this trick above). We can 
thus define instead 


let fix = fun f -> (fun x y -> f (x x) y) (fun x y -> f (x x) y) 


and now the definition of factorial works as expected. 

The second one, if you do not want to use some exotic flag, consists in using 
a recursive type, which allows such recursions in types. Namely, we can define 
the type 


type 'a t = Arr of ('a t -> 'a) 
with which we can define the fixpoint operators as 


let fix f = 
(fun x y -> f (arr x x) y) (Arr (fun x y -> f (arr x x) y)) 


where we use the shorthand 
let arr (Arr f) =f 
In the same spirit, the Turing fixpoint combinator can be implemented as 


let turing = 
let t f x y =x (arr f f x) y in 
t (Arr t) 
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3.3.6 Turing completeness. The previous encodings of usual functions, should 
make it more or less clear that the »-calculus is a full-fledged programming lan- 
guage. In particular, from the classical undecidability results [Tur37] we can 
deduce: 


Theorem 3.3.6.1 (Undecidability). The following problems are undecidable: 
— whether two terms are 6-equivalent, 
— whether a term can 6-reduce to a normal form. 


In order to make this result more precise, we should encode Turing machines 
into A-terms. Instead of doing this directly, we can rather encode recursive 
functions, which are already known to have the same expressiveness as Turing 
machines. The class of recursive functions is the smallest class of partially 
defined functions f : N* + N for some k € N, which contains the zero constant 
function z, the successor function s and the projections p¥, for k € N and 
1l<i<k: 


z:N° +N s: N' +N pes N* +N 
0 (n)tHen+1 (11,---,2R) ny 


and is closed under 
— composition: given recursive functions 
f:N oN and n,....g9:.N* ON 
the function 


comp! : N' >N 
(n1,---,e) > f(gi(mi,..-,Me),---,gi(M1,---,;Mk)) 


is also recursive, 


— primitive recursion: given recursive functions 


f:NeiN and g: N**? +N 
the function 
TeC fig : N*+1 _, N 
(0,71,-.-,Me) > f(mi,..., Mk) 
gece L, Niyogi) E> GTEC s g(a, Mis + +« 1K) | NOVI, «4¢ME) 


is also recursive, 
— minimization: given a recursive function f : N‘+! > N the function 
mins : N*¥ +N 


which to (n1,...,nx) € N* associates the smallest no € N such that 
f(mo,71,---,;Nz) = 0 is also recursive. 
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The presence of minimization is the reason why we need to consider partially 
defined functions. 

A function f : N° > N is definable by a »-term t when, for every tuple of 
natural numbers (n1,...,n%) € N*, we have 


tn... —>e f(ni,---, Mk) 


where n is the Church numeral associated to n. 


Theorem 3.3.6.2 (Kleene). The functions definable in A-calculus are precisely 
the recursive ones. 


Proof. The terms constructed in section 3.3 easily allow to encode total recursive 
functions f as A-terms [f]: we define 


(z] =O=Afa.a [s] = succ = Anfa.fnfx [pe] = Aa... . 2p.2; 
composition is given by 


gl =AtM1--- Ce LF] gie1 --- 2%) --- (gira... ex) 


primitive recursion by 


[recy g] = Y(Araor1 ... xy. iF (iszero xo) 
([f1..- 2x) 
([g](r(pred xo))(pred xo)a1...xx)) 


and minimization by 


[min ¢] = Y(Araox, ... xy. if (iszero ([f]xvox1 ...2%)) &o (r(succ 7p)x1...¢%)) 0 
In order to handle general recursive functions, which might be partial, there is 
a subtlety with composition: if g is not defined on 2, then comp} («) should 
not be defined, even if f is a constant function for instance, and this is not 
the case with the current encoding. This is easily overcome with the following 
construction: we write 

tLu=ut(Ag.2) 


for the term which should be read as “t provided u terminates”. It can be checked 
that t | uw does not reduce to a normal form if u does not and t | n —> gt. We 
can now use this trick to correct the behavior of our encoding. For instance, 
the projection should be encoded as 


[p*] = Avi... 2p.(2; 4 011... 4 tx) 


For the converse property, i.e. the definable functions are recursive, we should 
encode A-terms and their reduction into natural numbers, sometimes called 
Gédel numbers. This can be done, see [Bar84] (or if you are willing to accept 
that recursive functions are Turing-equivalent to usual programming languages, 
this amounts to showing that we can make a program which reduces A-terms, 
which we can, see section 3.5). 
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3.3.7 Self-interpreting. We see that A-calculus provides yet another model 
which is equivalent to Turing machines. This means that the functions we can 
compute in both model are the same, but not that they are equally simple 
to implement in both models. For instance, constructing a universal Turing 
machine is not an easy task: we have to decide on an encoding of the transitions 
on the tape and then build a Turing machine to use this encoding and the 
resulting machine is usually neither small nor particularly elegant. 

In A-calculus however, this is easy. For instance, we can encode a A-term t 
as a A-term 't', as follows. We first pick a fresh variable i (by fresh we mean 
here 7 ¢ FV(t)), replace every application uv in t by iuv and prepend Xi to the 
resulting term. For instance, the term 


t =succ0O = (Anfa.f(nfa))(Afa.x) 


is encoded as 

Tt1= rNit(Anfuif(i(in f)x))(Afa.z) 
Even though the original term t could reduce, the term "¢t™ cannot (because of 
the manipulation we have performed on applications), and can thus be consid- 
ered as a decent encoding of t. We can then define an interpreter as 


int = At.t (Ava) 
This term has the property that, for every A-term ¢t, int't’ G-reduces to the 
normal form of t. More details can be found in [Bar91, Mog92, Lyn17]. 


3.3.8 Adding constructors. Even though we have seen that all the usual 
constructions can be encoded in the A-calculus, it is often convenient to add 
those as new explicit constructions to the calculus. For instance, products can 
be added to the A-calculus by extending the syntax of A-terms to 


tus=a|tul|Aa.t| (t,u) | m | m 
The new expressions are 
— (t,u): the pair of two terms t and u, 
— m and z;: the left and right projections respectively. 


The 6-reduction also has to be extended in order to account for those. We add 
the two new reduction rules 


TY (t, u) — 7B t Ty (t, u) 7p U 


which express the fact that the left (resp. right) projection extracts the left 
(resp. right) component of a pair. Although most important properties (such as 
confluence) generalize to such variants of A-calculus, we stick here to the plain 
one for simplicity. Some extensions are used and detailed in section 4.3. 


3.4 Confluence of the )-calculus 


In order to be reasonably useful, the A-calculus should be reasonably determin- 
istic, i.e. we should be able to speak about “the” result of the evaluation (by 
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which we mean the -reduction) of a A-term. The first observation we already 
made is that, on given a term, multiple distinct reductions may be performed. 
For instance, 


(Ary.y)((Aa.a) (Ab.b)) 


wen ee 
(Axy.y)(Ad.b) 


Another hope might be that if we reduce a term long enough, we will end 
up with a normal form (a term that cannot be reduced further), which can 
be considered as a result of the computation, and that if we perform two such 
reductions on a term, we will en up on the same normal form: the intermediate 
steps might not be the same, but in the end we always end up with the same 
result. For instance, on natural numbers, we can speak of 10 as the result of 


y 


(1 +2) + (34 4) 
because it does not depend on the intermediate steps used to compute it: 
(1+ 2) + (34 4) 


3+ (34+ 4) (1+2)+7 


ee aie 
=4 ee" 


al: 
10 


However, in the case of A-calculus, this hope is vain because we have seen that 
some terms might lead to infinite sequence of 6-reductions, thus never reaching 
a normal form. 


3.4.1 Confluence. The property which turns out to be satisfied in the case of 
A-calculus is called confluence: it states that if starting from a term t reduces 
in many steps to a term wu, and also to a term ug, then there exists a term v 
such that both u, and ug reduce in many steps to v: 


In other words, computation starting from a term ¢t might lead to different 
intermediate results, but there is always a way for those results to converge to 
a common term. 

Note that this result would not be valid if we required to have exactly one 
reduction step each time. For instance, we need two reductions to complete the 
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following square on the right: 


(Ayx.xyy) | Ax.x(I)(11) 
Ax.x(II)I 
ef hen ae 
Axx 1 
where | = Aw.x is the identity. The easiest way to prove this confluence result 


first requires to introduce a variant of the $-reduction. 


3.4.2 The parallel G-reduction. The parallel G-reduction —» is the smallest 
relation on A-terms such that 


(a!) ame oe 
z—a2* (Ax.t)u —» t![u’/a] ~* 
t—t’ uu t— t’ f 
tu—>» tu! (Ba) dx.t —» dr2x.t! (83) 


As usual, we write —-» for the reflexive and transitive closure of the relation —>». 
Informally, t —» u means that u is obtained from ¢ by reducing in one step 
many of the 6-redexes present in ¢ at once. For instance, we have 


(Avy.| ax y) (LI) —» Ay.ly —» Ay-y 


where the first step intuitively corresponds to simultaneously performing the 
three G-reductions 


(Azy.l ae y) (11) 6 Ay. (ID y lx —>g @ Il —+, 1 


As for usual @-reduction, the parallel 3-reduction might create some (-redexes 
which were not present in the original term, and could thus not be reduced at 
first. For this reason, even though we can reduce in multiple places at once, we 
cannot perform a parallel G-reduction step directly from the term on the left to 
the term on the right in the above example. 

In parallel G-reduction, we are allowed not to perform all the available 6- 
reduction steps. In particular, we may perform none: 


Lemma 3.4.2.1. For every A-term t, we have t —» t. 


Proof. By induction on the term t. 


3.4.3 Properties of the parallel 6-reduction. We now study some proper- 
ties of the parallel 5-reduction. Since it corresponds to performing (-reduction 
steps in parallel, the relations —» and “> g coincide: we can simulate parallel 
6-reduction with 6-reduction and conversely. Moreover, we will see that paral- 
lel 6-reduction is easily shown to be confluent, from which we will be able to 
deduce the confluence of 6-reduction. 

First, any G-reduction step can be simulated by a parallel reduction step: 
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Lemma 3.4.3.1. If t —+g u then t —» u. 


Proof. By induction on the derivation of t —+g u. 


Conversely, any parallel 6-reduction step can be simulated by multiple (6-re- 
duction steps: 


Lemma 3.4.3.2. If t —» u then t 8 uU. 


Proof. By induction on the derivation of t —» u. 


From this, we immediately deduce that the reflexive and transitive closure of 
the two relations coincide: 


Lemma 3.4.3.3. We have t —» u if and only if t 4, u. 


Proof. If t —» u, this means that we have a sequence of parallel reduction steps 


t=to ty to > ae th =U 


Therefore, by lemma 3.4.3.2, 


t= to “+6 ty “+9 te “+B. ht, =u 


and thus t +5 u. Conversely, if t > g u, this means that we have a sequence 
of 6-reduction steps 


t= to 6 ty +g to >Bvas >gtn =U 


Therefore, by lemma 3.4.3.2, 


t=to ty to sae th =U 


and thus t —» U. 


Next, the parallel G-reduction is compatible with substitution. 


Lemma 3.4.3.4. Ift —» t’ and u —» u’ then t[u/a] —» t’'[u’/a]. 
Proof. By induction on the derivation of t —>» t’. 
— If the last rule is 
yy 
then t= y =?’ and we conclude with 
ylu/c] = y — y= ylu/z] 


or 
zlu/a2] =u— au = a[u/z] 


depending on whether y 4 x or y=Z. 
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— If the last rule is 
ty —> t to —> th \ 


(8s) 
(Ay-t1) t2 —» t[t5/y] 
with y # x, then, by induction hypothesis, we have 


ti[u/x] —» t[u'/a] ta[u/x] —» tp[u'/a] 
and thus, by ( Hy 


(Ay.ti[u/2]) (t2[u/a]) —> t[u'/al[ts[u'/a]/y] 


which can be rewritten as 
((Ay.t1) ta)[u/a] —» t,[t5/yl[u’/2] 


— If the last rule is 


ti —> ¢ to —> tt 
= (al) 
ty i) att ty ty 
then, by induction hypothesis, we have 
ti[u/z] — th[u'/2] to[u/z] — th[u'/a] 


and thus, by ( Hy 
(ti [u/a]) (tolu/a]) —» (t4[u'/a]) (t2[u'/2]) 


in other words 
(t1 t2)[u/x] —» (t, t9)[u'/2] 
— If the last rule is 


ty —> t \ 


the by induction hypothesis we have 
ty[u/x] —» ty[u'/a] 
and thus, by ( lh, 
(Ay-ty)[u/a] = Ay-ty[u/a] —» Ay.th [w/a] = (dy-th) lu /a] 


and we are done. 


We can use this lemma to show that the G-reduction satisfies a variant of the 
confluence property called the diamond property, or local confluence: 


Lemma 3.4.3.5 (Diamond property). Suppose that t —» u and t —» u’. Then 
there exists v such that u —» v and u’ —> v: 


t 
aa 


sly ue 
U 
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Proof. Suppose that t —» u and t —» u’. We show the result by induction on 
the derivation of t —» u. 


— If the last rule of the derivation of t —» u is 


(Bl) 


wv > XL 


then t = x = u and, by lemma 3.4.2.1, we have u! —» u’: 


“™, 
No 


— If the last rule of the derivation of t —» u is 


ty — uy tz — U2 


cs) 


(Aa.t1)te — U1 [u2/a] 
we have two possible cases depending on the derivation of t —» wu’. 
— If the last rule of the derivation of t —» u’ is 


/ / 
ty — uy tz —> Ug 


(Av.ty)tz —» uj [us/2] 


(a!) 


then, by induction hypothesis, there exists a term vu; such that u; —» v; 
and ui —» v;, fori = 1 or i = 2: 


ty te 
aa aa 


sage, bie ar jet 
U1 v2 


/ 
2 


Therefore, we have both 
uy[U2/2] —» vy [v2/2] and ui [uy /x2] —» vy[v2/z] 
by lemma 3.4.3.4 and we can conclude: 


(Aa.t1) ta 
uy (U2 /2] uj [u/z] 
v1 [V2/a] 


— If the last rule of the derivation of t —» wu’ is 


Aw.ty —» th tg —» uh (all 
(Ag.t1) te —» th uh e 
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then the last rule of the derivation of Ax.t; —» t{ is necessarily of 


the form 
ty > y I 


(83) 


Ax.ty —» Ax.u} 


with t, = Av.u,. By induction hypothesis, we have the existence of 
the dotted reductions 


ty to 
x Zr S 
Uy . Uy Ug u 


“ae Ak” 
V1 v2 


/ 
2 


We thus have 
uy [U2/x] —> vi [v2/2] 


by lemma 3.4.3.4 and 
(Azul) us —> v1 [v2/7] 
by ( Ny from which we conclude: 


(Aa.t1) ta 


uy [U2 /a] (Ax.u) us 


Ss a 


v1 [v2/a] 


— If the last rule of the derivation of t —» u is 


ty — U1 tg —» U2 


(Bi) 


ty tg —>» ul U2 


the derivation of t —» u’ ends either with ( lly or ( lly and both cases are 


handled similarly as above. 
— If the last rule of the derivation of t —» u is 


ty —» UL i 


(83) 


Ax.ty —> ADU 


we can reason similarly as above. 
From this follows easily the confluence property of the relation —» in two steps: 


Lemma 3.4.3.6. Suppose that t —» u and t —» u'. Then there exists v such 


that u—» v and u! —>» v: 
t 
uw u! 


x AL ye 
U 
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Proof. By induction on the length of the upper-right reduction t =5 u’, using 
lemma 3.4.3.5. 


Theorem 3.4.3.7 (Confluence). Suppose that t —+ u and t —» u’. Then there 


& 
exists v such that u —» v and u’ —» v: 


t 
a SG 
uw u! 
«x US 
Vv 


Proof. By induction on the length of the upper-left reduction ¢ —» u, using 
lemma 3.4.3.6. 


3.4.4 Confluence and the Church-Rosser theorem. As a consequence of 
the above lemmas, we can finally deduce the confluence property of A-calculus, 
first proved by Church and Rosser [CR36], the proof presented here being due 
to Tait and Martin-Lof: 


Theorem 3.4.4.1 (Confluence). The @-reduction is confluent: if t +g wu; and 
t +, ug then there exists v such that uw, >, v and ug >, vU: 


t 
u 
Pa ek 
Proof. Suppose that t —+g u; and t —+g us. By lemma 3.4.3.3, we have 


t—» uy, and t 9 ug. From theorem 3.4.3.7, we deduce the existence of v such 


that uy —» v and ug +» and, by lemma 3.4.3.3 again, we have u, >, v 


and ug —> gv. 
This implies the following theorem, sometimes called the Church-Rosser property 
of A-calculus: 


Theorem 3.4.4.2 (Church-Rosser). Given two terms ¢ and u such that t ==, u, 
there exists a term v such that t —+g v and u —4, v: 


Proof. By definition of G-equivalence, see section 3.2.7, there is n € N and 
terms t; for 0 <7 < 2n such that 


t=toge ty “+9 tog t3 39... Be ton—-1 “+ ton =U 
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We show the result by induction on n. For n = 0, the result is obvious. Other- 
wise, we can complete the diagram as follows: 


ty t3 ton—1 
on Se aS a 
= to (c) 


t to see tan = U 


where (C) is obtained by theorem 3.4.4.1 and (IH) by induction hypothesis. 


One of the most important consequences is that a \-term cannot reduce to two 
distinct normal forms: if the computation terminates then its result is uniquely 
defined. 


Proposition 3.4.4.3. If t and u are two 6-equivalent terms in normal forms then 
t=u. 


Proof. By theorem 3.4.4.2, there exists v such that t +z v and u — v. 
Since ¢t and u are normal forms, they cannot reduce and thus t = v = u. 


Another byproduct is the so-called consistency of A-calculus which states that 
the G-equivalence relation does not identify all terms: 


Theorem 3.4.4.4 (Consistency). There are terms which are not 6-equivalent. 


Proof. The terms Axy.x and Axy-y are normal forms. If they were equivalent 
they would be equal by previous proposition, which is not the case. 


3.5 Implementing reduction 


3.5.1 Reduction strategies. We have seen that a \-term can reduce in many 
ways, but in practice people implement a particular deterministic way of choos- 
ing reductions to perform: this is called a reduction strategy. This is the case 
for OCaml and is easily observed by inserting prints. For instance, the program 


let p 
let _ 


print_endline 
(p "a"; (fun x y -> p "b"; x + y)) (p "ce"; 2) (p "d"; 3) 


will always print dcab in the toplevel. We shall now try to look at the options 
we have here, in order to chose a strategy. A first question we have to answer is: 
should we reduce functions or arguments first? Namely, consider a term of the 
form (Aa.t)u such that u reduces to u’, we have two possible ways of reducing 
it: 


tlu/a]g<— (Aw.t)u —4¢ (Az.t)u’ 


which correspond to reducing functions or arguments first, giving rise to strate- 
gies which are respectively called call-by-name and call-by-value. The call-by- 
value has a tendency to be more efficient: even if the argument is used multiple 
times in the function, we reduce it only once beforehand, whereas the call-by- 
name strategy reduces it each time it is used. For instance, if u + a, where 
a is a normal form, we have the following sequences of reductions: 
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— in call-by-value: (Ax. faxx)u —+g (Av. fax) —g fat, 


— in call-by-name: (Ax. frx)u —>g fuu —>g fitu >, fat. 

The function Ax. fxax uses its argument twice and therefore we have to reduce u 
twice in the second case compared to only once in the first (and this can make a 
huge difference if the argument is used much more than twice or if the reduction 
of u requires many steps). However, there is a case where the call-by-value 
strategy is inefficient: when the argument is not used in the function. Namely, 
we always reduce the argument, even if it is not used afterwards. For instance, 
we have the following sequences of reductions: 


— in call-by-value: (Ax.y)u —> (Ax.y)& —g y 
— in call-by-name: (Axv.y)u —>g y 


We have already observed in section 3.2.3 that $-reduction can duplicate and 
erase (-redexes: the call-by-value strategy is optimized for duplication and the 
call-by-name strategy is optimized for erasure. In practice, people often write 
programs where they use a result multiple times and rarely discard the result of 
computations, so that call-by-value strategies are generally implemented (this is 
for instance the case in OCaml). However, for theoretical purposes call-by-value 
strategies can be a problem: it might happen that a term has a normal form 
and that this strategy does not find it. Namely, consider the term 


(Az.y)Q 
A call-by-value strategy will first try to compute the normal form for Q and thus 
loop, whereas a call-by-name strategy will directly reduce it to y. A strategy is 


called normalizing when it will reach a normal form whenever a term has one: 
we have seen that call-by-value does not have this property. 


Orders on redexes. In more precise terms, to define a reduction strategy, we 
have to chose the order in which we will reduce the redexes. Two partial orders 
can be defined on redexes: 


— the imbrication order: a redex is inside another redex when it is a subterm 
of it, i.e. the redexes of t or of u are inside the redex 


(Ax.t)u 


— the horizontal order: in a subterm 
tu 
every redex in t is on the left of every redex in u. 


Any two redexes in a term can be compared with one of those orders; a strategy 
can thus be specified by which redexes it favors with respect to each of these 
orders: 


—a strategy is innermost (resp. outermost) when it begins with redexes 
which are the most inside (resp. outside), 


— a strategy is left (resp. right) when it begins with redexes which are the 
most on the left (resp. right). 


For instance, the above examples illustrate the fact that the call-by-value and 
call-by-name strategies are respectively innermost and outermost. 
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Partial evaluation. Another possibility which is generally offered when defining 
a reduction strategy is to allow not reducing some terms which are not in normal 
form. These terms can be thought of as “incomplete” and we are waiting for 
some more information (e.g. an argument or the value of a free variable) in order 
to further reduce them. Two such families are mainly considered: 


—a strategy is weak when it does not reduce abstractions: a term of the 
form Azx.t will never be reduced, even if the term ¢ contains redexes, 


— a left strategy is head when it does not reduce variables applied to terms: 
a term of the form xt, ...t,, will never be reduced, even if some term t; 
contains redexes. 


A strategy is full when it is neither weak nor head. 

The reason for considering weak strategies is that a function is usually 
thought of as describing the actions to perform once arguments are given and 
it is therefore natural to delay their execution until we actually know those ar- 
guments. For instance, the strategy implemented in OCaml is weak: if it was 
not the case then the program 


let f n= 
print_endline "Incrementing!"; 
n+1 


would always print the message exactly once, even if the function is never called, 
whereas we expect that the message is printed each time the function is called 
(which is the case with a weak evaluation strategy). In pure A-calculus, there is 
no printing but one thing is easily observed: non-termination. For instance, we 
want to be able to define a function which loops or not depending on a boolean 
as follows: 

Ab.if BQ 


This function takes a boolean as argument: if it is true it will return the term 2 
whose evaluation is going to loop, otherwise it returns the identity. If we evaluate 
it with a weak strategy it will behave as expected, whereas if we use a non- 
weak one, we might evaluate the body of the abstraction and thus loop when 
reducing 2, even if we give false as argument to the function. 

Head reductions mostly make sense for strategies like call-by-name: in a 
term (Ax.t)u, we reduce to t[u/x] even if u is not in normal form because we 
want to delay the evaluation of wu until it is actually used. Now, in a term xu, 
it might be the case that the free variable x will be replaced by an abstraction 
later on, and we want therefore to delay the evaluation of u until this is the 
case. 

A term which cannot be reduced by a weak or head strategy is not necessarily 
in normal form in the usual sense. Recall from proposition 3.2.6.1 that A-terms 
in normal form can be described by the following grammar: 


vis Anu | LUL...Un 
where v and v; are normal forms. A term is 
— a weak normal form when it is generated by 
vn=Agt | vU...Un 


where ¢ is a term and the v; are weak normal forms, 
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— a head normal form when it is generated by 
vis Agu | oty...tr 
where v is a head normal form and the t; are terms, 


— a weak head normal form when it is generated by 
vu=Agt | ety...th 
where t and the ¢; are terms. 


The terms which cannot be reduced in a weak (resp. head, resp. weak head) 
strategy are precisely weak (resp. head, resp. weak head) normal forms. Weak 
normal forms coincide with normal forms for terms which are not abstractions 
(resp. closed terms): they do the job if we are mostly interested in those terms, 
which we usually are. However, there are function which are weak (resp. head) 
normal forms such as Az.I| (resp. a (I1)) and are not normal forms, so that a 
weak (resp. head) strategy is never normalizing. 


Summary of strategies. We will detail below four reduction strategies, whose 
main properties are summarized below. Those strategies are the most well- 
known and used ones, but other variants could of course be considered: 


| left. inner. weak head norm. 


AO] Vv v 
CBV | Vv v Vv 

NO] Vv v 
CBN | Vv v v 


The columns respectively indicate whether the strategy is leftmost (or right- 
most), innermost (or outermost), weak, head and normalizing. 


Implementing A-terms. In order to illustrate implementations of reduction strate- 
gies in OCaml, we will encode A-terms using the type 


type term = 
| Var of var 
| App of term * term 
| Abs of var * term 


where var is an arbitrary type for identifying variables (in practice, we would 
choose int or maybe string). We will also need a substitution function, such 
that subst x t u computes the term u where all occurrences of the variable x 
have been replaced by the term t: 


let rec subst x t = function 
| Var y -> if x = y then t else Var y 
| App (u, v) -> App (subst x t u, subst x t v) 
| Abs (y', u) -> 
let y = fresh () in 
let u = subst y' (Var y) u in 
Abs (y, subst x t u) 
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In order to avoid name captures, we always refresh the names of the abstracted 
variables when substituting under an abstraction (this is correct, but quite in- 
efficient): in order to do so we use a function fresh which generates a new 
variable name each time it is called, e.g. using an internal counter incremented 
at each call. For each of the considered strategies below, we will define a func- 
tion reduce which performs multiple 6-reduction steps in the order specified by 
the strategy. 


Call-by-value. The call-by-value strategy (CBV) is by far the most common; it 
is the one used by OCaml for instance. Its name comes from the fact that it 
computes the value of the argument of a function before applying the function 
to the argument. It is defined as the weak leftmost innermost strategy. This 
means that, given an application tu, 


1. we evaluate ¢ until we obtain a term of the form Az.t’ (where t’ is not 
necessarily a normal form), 


2. we then evaluate the argument u to a weak normal form &, 
3. we then evaluate t’[é/a]. 


The reduction function associated to this strategy can be implemented as fol- 
lows: 


let rec reduce = function 
| Var x -> Var x 
| Abs (x, t) -> Abs (x, t) 
| App (t, u) -> 
match reduce t with 
| Abs (x, t') -> subst x (reduce u) t' 
eae -> App (t, reduce u) 


In the case App (t, u) it can be observed that both terms t and u are always 
reduced, so that taking the rightmost variant of the strategy has little effect. 
Since it is a weak strategy, it is not normalizing, and normal forms for this 
strategy will be weak normal forms. The above function does not directly com- 
pute the weak normal form: it has to be iterated. For instance, applying it to 
(Av.ay)(Av.x) will result in (Av.x)y, which further reduces to y. 


Applicative order. The applicative order strategy (AO) is the leftmost innermost 
strategy, i.e. the variant of call-by-name where we are allowed to reduce under 
abstractions. 


let rec reduce = function 
| Var x -> Var x 
| Abs (x, t) -> Abs (x, reduce t) 
| App (t, u) -> 
match reduce t with 
| Abs (x, t') -> subst x (reduce u) t' 
| t -> App (t, reduce u) 


Normal forms are normal forms in the usual sense. As illustrated above, by the 
term (Awv.y)Q, this strategy might not terminate even though the term has a 
normal form, i.e. the strategy is not normalizing. 
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Call-by-name. The call-by-name strategy (CBN) is the weak head leftmost out- 
ermost strategy. Here, arguments are computed at each use and not once for all 
as in call-by-value strategy. An implementation of the corresponding reduction 
is 


let rec reduce = function 
| Var x -> Var x 
| Abs (x, t) -> Abs (x, t) 
| App (t, u) -> 
match reduce t with 
| Abs (x, t') -> subst x u t' 
| t -> App (t, u) 


Iterating this function computes the weak head normal form for a term, which 
is the appropriate notion of normal form for the strategy. This strategy being 
weak and head, it is not normalizing. However, it can be shown that if a term 
has a weak head normal form, this strategy will compute it (this can be obtained 
as a variant of the corresponding result for the normal order, see below). 


Normal order. The normal order strategy (NO) is the leftmost outermost strat- 
egy. An implementation is 


let rec reduce = function 
| Var x -> Var x 
| Abs (x, t) -> Abs (x, reduce t) 
| App (t, u) -> 
match reduce_cbn t with 
| Abs (x, t') -> subst x u t' 
| t -> App (reduce t, reduce u) 


where reduce_cbn is the above call-by-name reduction strategy. Normal forms 
for this strategy are normal forms in the usual sense, and this strategy is actually 
normalizing: it can be shown that if there is a way to reduce a \-term to a normal 
form then this strategy will find it, thus its name: this is a consequence of the so- 
called standardization theorem in A-calculus [Bar84, chapters 12 and 13], [SU06, 
theorem 1.5.8]. 


Normalization. As indicated earlier, the reduce functions perform multiple re- 
duction steps in the order specified by the strategy, so that iterating it reduces 
the term to its normal form following the strategy. A term can thus be normal- 
ized according to one of the strategies using the following function: 


let rec normalize t = 
let u = reduce t in 
if t = u then t else normalize u 


3.5.2 Normalization by evaluation. The implementations provided in sec- 
tion 3.5.1 are not really efficient because of one small reason: the substitution 
function is not implemented efficiently. Doing this efficiently, while properly 
taking bound variables in account, is actually quite difficult, see section 3.6.2. 
When using a functional language such as OCaml, the compiler already has 
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support for that and we can use the reduction of the host language in order 
to implement the 6-reduction and compute normal forms. This is called nor- 
malization by evaluation: we implement the normalization function using the 
evaluation of the language. We shall now see how to perform that in practice. 


Evaluation. We begin by describing our A-terms as usual: 


type term = 
| Var of string 
| Abs of string * term 
| App of term * term 


For convenience, variable names are described by strings. For instance, we can 
define the looping A-term 2 = (Aw.ax)(Ax.xx) by 


let omega = 
let o = Abs ("x", App (Var "x", Var "x")) in 
App (0, 0) 


We are going to evaluate those terms to normal forms, which we call here values. 
We know from proposition 3.2.6.1 that those normal forms can be characterized 
as the A-terms v generated by the grammar 


vis Anu | LVL... Un 


The terms of the second form #1 ...U,y intuitively correspond to computations 
which are “stuck” because we do not know the function which is to be applied to 
the arguments: we only have a variable x here and will only be able to perform 
the reduction when this variable is substituted by an actual abstraction. Those 
are called neutral values and can be described by the grammar 


ni=ax|nv 

where v is a value. With this notation, values can be described by 
vis Agvu|n 

We can thus describe values as the following datatype: 


type value = 
| VAbs of string * value 
| VNeu of neutral 
and neutral = 
| NVar of string 
| NApp of neutral * value 


Now, remember that our idea is to use the evaluation of the language. In order 
to do so, the trick consists in describing A-term Ax.t not as a pair consisting of 
the variable x and the term t, but as the function 


ur tlu/a] 


which to a term u associates the term t with occurrences of x replaced by u: 
after all, the only thing we want to be able to perform with the A-term Az.t is 
G-reduction! Instead of the above type, we thus actually describe values in this 
way by 
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type value = 
| VAbs of (value -> value) 
| VNeu of neutral 
and neutral = 
| NVar of string 
| NApp of neutral * value 


We can then implement a function which evaluates a term to a value as follows: 


let rec eval env = function 
| Var x -> 
(try List.assoc x env with Not_found -> VNeu (NVar x)) 
| Abs (x, t) -> VAbs (fun v -> eval ((x,v)::env) t) 
| App (t, u) -> vapp (eval env t) (eval env u) 
and vapp v w = 
match v with 
| VAbs f -> f w 
| VNeu n -> VNeu (NApp (rn, w)) 


The function eval takes as second argument the term to be evaluated (i.e. nor- 
malized) and, as first argument, an “environment” env which is a list of pairs 
(x,v) consisting of a variable x and a value v, such a pair indicating that the 
free variable x has to be replaced by the value v in the term during the evalu- 
ation (initially, this environment will typically be the empty list []). We then 
evaluate terms as follows: 


— if the term is a variable x, we try to look up in the environment if there 
is a value for it, in which case we return it, and return the variable x 
otherwise, 


— if the term is an abstraction Ax.t, we return the value which is the function 
which to a value v associates the evaluation of t in the environment where x 
is bound to v, 


— if the term is an application tu, we evaluate t to a value f and u to a 
value a; depending on the form of t, we have two cases: 
— ift = f is a function, we simply apply it to @, 
—iff= LVL... Un, we return ©V,... Un Ui, 


this last part being taken care of by the auxiliary function vapp (which 
applies a value to another). 


Finally, the environment is only really used during the evaluation and we define 
let eval t = eval [] t 


because from now on we will only use it in the empty environment. 

You should note that an abstraction is not evaluated right away; instead, 
we construct a function which will evaluate it when an argument is given. In 
this sense, the function actually computes the weak normal form for the term. 
This function will not terminate if the $-reduction of the term does not. For 
instance, the evaluation of will not terminate: 


let _ = eval omega 
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Readback. We are now pleased because we have a short and efficient implemen- 
tation of normalization, except for one point: we cannot easily print or serialize 
values because they contain functions. We now explain how we can convert a 
value back to a term: this procedure is called readback. We will need an infinite 
countable pool of fresh variables, so that we define the function 


let fresh i = "x@" * string_of_int i 


which generates the name of the i-th fresh variable, that we call here “x@:”, 
supposing that initial terms will never contain variable names of this form (say 
that the user cannot use “@” in a variable name). The readback function can 
be implemented as follows: 


let rec readback i v = 
(x Read back a neutral term. *) 
let rec neutral = function 
| NVar x -> Var x 
| NApp (n, v) -> App (neutral n, readback i v) 
in 
match v with 
| VAbs f -> 
let x = fresh i in 
Abs (x, readback (i+1) (f (VNeu (NVar x)))) 
| VNeu n -> neutral n 


It takes as argument an integer i (the index of the first fresh variable we have 
not used yet) and a value v and returns the term corresponding to the value: 


— if the value is a function f, we return the term Ax.t where t = f(x) for 
some fresh variable x, 


— otherwise it is of the form xv,...vuy, and we return 101 ...Un where V; is 
the term corresponding to the value 1. 


We can then define function which normalizes a term by evaluating it to a value 
and reading back the result: 


let normalize t = readback Q (eval t) 


For instance, we can compute the normal form of the »-term (Avy.x)y, which 
is Az.y, by 


let _ = 


let t = App (Abs ("x", Abs ("y", Var "x")), Var "y") in 
normalize t 


which gives the expected result 
Abs ("x@0", Var "y") 


Note that this reduction requires a-converting the abstraction on y, and this 
was correctly taken care of for us here. 
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Equivalence. Finally, we can test for G-equivalence of two A-terms by comparing 
their normal forms (see section 4.2.4): 


let eq t u = (normalize t) = (normalize u) 


This is not as obvious as is seems: it also takes care of a-conversion! Namely, 
the readback function does not “randomly” generate fresh variables, but incre- 
menting the counter 7 starting from 0 when progressing into the term. Because 
of this, it canonically renames the variables. For instance, one can check that 
the functions Ax.a and Ay-y are equal 


let O = 
let id = Abs ("x", Var "x") in 
let id' = Abs ("y", Var "y") in 
assert (eq id id') 


Namely, both identity functions are going to be normalized into the term 
Abs ("x@0", Var "x@@") 


The above equality normalizes two terms in order to compare them. It is 
not as efficient as it could be in the case where the two terms are not equivalent: 
we might ensure that two terms are not equivalent without fully normalizing 
them. In order to understand why, first observe the following: 


Lemma 3.5.2.1. Given a term t, the normal form of a term 
— dx.t is necessarily of the form \z.t, 
— at is necessarily of the form xf 

where ¢ is the normal form of t. 


Proof. This follows from the facts that the only possible way to 6-reduce a term 
Aa.t (resp. xt) is of the form Ax.t —+g Az.t’ (resp. xt —+g xt’) by the rule (8) 


(resp. (3). 


For this reason, we know that two terms of the form Az.t and xu are never 
6-convertible, no matter what the terms t and u are. In such a situation, 
there is thus no need to fully normalize the two terms to compare them. More 
generally, a term xt ...t, is never equivalent to an abstraction. Based on this 
observation, we can implement the test of G-equivalence as follows: 


let eq tu = 
(* Equality of values *) 
let rec veq i v w= 
match v, w with 
| VAbs f, VAbs g -> 
let x = VNeu (NVar (fresh i)) in 
veq (it+1) (f x) (g x) 
| VNeu m, VNeu n -> neq i mn 
| _, _ -> false 
(x Equality of neutral terms *) 
and neq i mn = 
match m, n with 
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| NVar x, NVar y -> x = y 
| NApp (m, v), NApp (n, w) -> neq i mn && veq iv w 
| _, _ -> false 
in 
veq @ (eval t) (eval u) 
Given two terms t and u, we reduce them to their weak normal form, i.e. we 
reduce them until we find abstractions: 


— if they are of the form \z.t’ and xu, ...Uy (or conversely), we know that 
they are not equivalent (even though we have not computed the normal 
form for t), 


— if they are of the form Ax.t’ and Ax.u’ then we compare t’ and u’ (which 
requires evaluating them further) 


— if they are of the form rt,...tm and yu,...Um, where the t; and u; are 
weak normal forms, then they are equivalent if and only if «= y,m=n 
and t; = u; for every index 2. 


For instance, this procedure allows ensuring that Axv.Q is not convertible to x: 
let O = 

let t = Abs ("x", omega) in 

assert (not (eq t (Var "x"))) 


whereas the former equality procedure would loop when comparing the two 
terms because it tries to fully evaluate Ax... 


3.6 Nameless syntaxes 


It might be difficult to believe at first, but a great source of bugs in software 
implementing compilers, proof-assistants, proving functional programs, and so 
on, comes from the incorrect handling of a-conversion. For instance, a naive 
implementation of substitution is: 


xlu/az] =u 
ylu/z] =y when y # x 
(tt’)[u/2] = (t{u/2]) (t'[u/2]) 
[u/a] 


) 
(Ay.t)[u/a] = Ay.t[u/a] 


The last case is incorrect for two reasons. Firstly, we have to suppose that x 4 y, 
otherwise, the variables x inside t are not free, but rather bound by the abstrac- 
tion, and should thus not be substituted: this case should be (Az.t)[u/a] = Az.+t. 
Secondly, we also have to suppose y ¢ FV(t): we are substituting 2 by u under 
the abstraction Ay without taking in account the fact that y might get bound 
in y in this way. For instance, this implementation would lead to the following 
sequence of $-reductions 


(Ay.yy)(Afa.fa) — (Afa.fa)(Afa.fa) — Au.(Afau.fa)a — rAxx.0x 


In the second reduction step, there is an erroneous capture of x: a correct 
normal form for the above term is Axy.cy. There are multiple ways around 
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this, and we have already seen in section 3.5.2 that normalization by evaluation 
provides a satisfactory answer to this question. However, there are cases where 
this technique is not an option (e.g. the host language is not functional, or we 
want to perform more subtle manipulations than simply normalizing terms, or 
we want to formalize -calculus in a proof assistant). We present below some 
alternative syntaxes for A-calculus which allow taking care of a-conversion in 
terms and implement -reduction correctly and efficiently. 


3.6.1 The Barendregt convention. A first idea in order to avoid incorrect 
captures of variables is to use the so-called Barendregt convention for naming 
the variables of A-terms: all variables which are A-abstracted should be pairwise 
distinct and distinct from all free variables. 


Lemma 3.6.1.1. Every term is a-equivalent to one satisfying the Barendregt 
convention. 


This convention sometimes simplifies things. For instance, the above naive im- 
plementation of 6-reduction works on terms satisfying the convention. However, 
after one 6-reduction step, the A-term is not guaranteed to satisfy the Baren- 
dregt convention anymore (see the above example) and it is quite expensive to 
have to a-convert the whole term at each reduction step in order to enforce the 
convention. 


3.6.2 De Bruijn indices. A more serious solution to this problem is given by 
de Bruijn indices. The idea is that, in a closed term, every variable is created 
by a specific abstraction in the term, so that instead of referring to a variable 
by its name, we can identify it by the abstraction which created it. Moreover, 
it turns out that there is a very convenient way to refer to an abstraction: the 
number of abstractions we have to step over when going up in the syntactic tree 
starting from the variable in order to reach the corresponding abstraction. This 
number is called the de Bruijn index of the variable. For instance, consider the 
A-term 
Ax.x(Ay.yx) 


This lambda term can be graphically represented as a tree where a node la- 
beled “Az” corresponds to an abstraction and a node “@” corresponds to an 
application: 


f @ | 
ie Te 


we have also figured in dotted arrows the links between a variable and the 
abstraction which created it. In the first variables x and y, the abstraction we 
are referring to is the one immediately above (we have to skip 0 .’s), whereas 
in the last occurrence of 7, when going up starting from x in the syntactic tree, 
the corresponding abstraction is not the first one (which is Ay) but the second 
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one (we have to skip 1 \). The information in the A-term can thus equivalently 
be represented by 
Ax.0(Ay.01) 


where each variable has been replaced by the number of X’s we have to skip when 
going up to reach the corresponding abstraction (note that a given variable, such 
as x above, can have different indices, depending on its position in the term). 
Now, the names of the variables do not really matter since we are working 
modulo a-conversion: we might as well drop them and simply write 


\.0(A.01) 


This is a very convenient notation because it does not mention variables any- 
more. What is not entirely clear yet is that we can implement 6-reduction in 
this formalism. We will see that it is indeed possible, but quite subtle and 
difficult to get right. 


Terms with de Bruijn indices. We thus consider a variant of the A-calculus where 
terms are generated by the grammar 


tun=il|tu| At 


where 7 € N is the de Bruijn index of a variable. Following the preceding 
remarks, a conversion function of_term from closed A-terms into terms with de 
Bruijn indices is provided in figure 3.1. It takes an auxiliary argument 1 which 
is the list of variables already declared by abstractions: the de Bruijn index of 
a variable is then the index of the variable in this list. 

The preceding function will raise the exception Not_found if the term con- 
tains free variables. It is however possible to adapt them to represent terms with 
free variables using de Bruijn indices. The idea is that we should represent a 
term t with n free variables FV(t) = {xo,...,2%n—1} as if we were computing the 
de Bruijn representation of ¢ in the term Aw,_1....AxQ.t, i.e. the free variables 
are implicitly abstracted. For instance 


AL.LXLY LQ is represented as A.013 


In practice, it is also possible (and convenient) to “mix” the two conventions: 
have names for free variables and de Bruijn indices for bound variables. This is 
called the locally nameless representation of \-terms [Cha12]. 


Reduction. Our goal is now to implement (-reduction in the de Bruijn repre- 
sentation of terms. The rule is, as usual, 


(A.t)u — t[u/0] 


meaning that the variable 0 has to be replaced by wu in t, where the substitution 
t{u/0] remains to be defined. We actually need to define the substitution of any 
variable since, when going under an abstraction, the index of the variable to be 
substituted is increased by one. For instance, the $-reduction 


Au.(Ay.rz.y) (At.t) +g Av.(Az.y)[At.t/y] = Aw.Az.y[At.t/y] = Aw.Az.At.t 
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(** 


Traditional A-terms. *) 


type lambda = 


(«x 


LVar of string 
LApp of lambda * lambda 
LAbs of string * lambda 


De Bruijn A-terms. *) 


type deBruijn = 


(xx 
let 


Var of int 
App of deBruijn * deBruijn 
Abs of deBruijn 


Index of an element in a list. *) 

rec index x = function 

y::l -> if x = y then @ else 1 + index x l 
[] -> raise Not_found 


De Bruijn representation of a closed term. *) 
of_term t = 


let rec aux 1 = function 


in 
au 


| LVar x -> Var (index x 1) 
| LApp (t, u) -> App (aux 1 t, aux 1 u) 
| LAbs (x, t) -> Abs (aux (x::1) t) 


>a al Fic 


Figure 3.1: Converting A-terms into de Bruijn representation. 
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i.e. graphically 


AL AL 
1) tL 
@ rz 
i: 
Ay At —*B dt 
AL te Ns 
AZ t t 
aT 
y 


should correspond to the following steps 


N.(A.A.1) A.0 —9g A.(A.1)[A.0/0] = A.A-1[A.0/1] = 2.0.2.0 


We are thus tempted to define substitution by 


[u/i] =u 

[u/i] = 5 for j #1 

(¢t’) [w/a] = (t{u/é)) ('[u/d) 
(A.t)[u/2] = 2.t[u/i + 1) 

But it is incorrect because, in the last case, u might contain free variables, which 


refer to above abstractions, and have to be increased by 1 when going under the 
abstraction. For instance, 


Au.(AXy.Az.y) & 8 Ax.(Az.y) [a /y] = Av.Az-y[x/y] = Awv.Az.a0 


i.e. graphically 


a F, a 

eS @ \ de. 

Ay xz —*B : 
ed 
rz 
al 
¥y 


currently gives rise to the reduction 

A.(A.A.1) 0 —g A.(A.1)[0/0] = A.A.1[0/1] = A.A.0 
whereas the correct reduction is 

A(A.A.L) 0 +g A.(A.1)[0/0] = A.A-1[1/1] = A.A-1 
The moral is that the last case of substitution should actually be 


(A.t)[u/i] = At[u! /é + 1] (3.3) 
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where u’ is the term obtained from wu by increasing by 1 all free variables (and 
leaving other variables untouched), what we will write u’ = ty u in the following. 
The “corrected version” with (3.3) still contains a bug, which comes from the fact 
that G-reduction removes an abstraction, and therefore the indices of variables 
in ¢ referring to the variables abstracted above the removed abstraction have to 
be decreased by 1. For instance, 


Aa.(Ay.£) (At.t) —>g Au.a[At.t/y] = Av.x 


i.e. graphically 


» AL AL 
ud Ve 
= @ z 
|e oy Se 
“Ay At 
| ve 
g t 


currently gives rise to the reduction 

A.(A.1) (A.0) —+¢ AL[A.0/0] = 2.1 
whereas the correct reduction is 

A.(A.1) (A.0) —+¢ A.1[A.0/0] = A.0 


This means that we should also correct the second case of substitution in order 
to decrease the index of variables which were free in the original substitution. 
And now we have it right. 

In order to distinguish between bound and free variables in a term, it will be 
convenient to maintain an index I, called the cutoff level, such that the indices 
strictly below / correspond to bound variables and those above are free variables. 
We thus first define a function t, such that t,t is the term obtained from ¢ by 
increasing by one all variables with index 7 > I, called the lifting of t at level U. 


By induction, 
. je ifi<l 
hes i is 
ti(tu) = (t,t) (tu) 
T10.-t) - A-(tat t) 


The right way to think about it is that t,t is the term obtained from t by adding 
a “new variable” of index I: the variables of index i > 1 have to be increased 
by 1 in order to make room for the new variable. Similarly, we can define a 
function |, such that, for every term ¢t which does not contain the variable /, |, t 
is the term obtained by removing the variable / (the unlifting of t): all variables 
of index 7 > 1 have to be decreased by one. It turns out that we will only need 
it when t is a variable so that we define 


Li 1-1 ift>I1 
.=S 
‘ i ifi<l 
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(it is not defined when i = 1). With those at hand, we can finally correctly 
define substitution: 


Definition 3.6.2.1 (Substitution). Given terms ¢ and wu and variable i, we define 
the substitution of 2 by u in t 


t[u/%| 
by induction by 
jlu/i] = Li 5 for j #4 
(tt')[u/é] = (t[u/d]) (t'[u/i)) 
(A.t)[u/t] = A.t[to u/i + LY 


As indicated above, 6-reduction can then be implemented with the rule 
(A.t)u — t[u/0] 


An implementation of call-by-value 3-reduction (see section 3.5.1) on A-terms 
in de Bruijn representation is given in figure 3.2. 


3.6.3 Combinatory logic. Combinatory logic, which was introduced by Schén- 
finkel [Sch24], and further studied by Curry [Cur30, CF58], is another possible 
representation of A-terms which does not need to use variable binding or a- 
conversion: in this syntax, there is simply no need for variables. Introductory 
references on the subject are [Bar84, chapter 7] and [Sel02]. 

Our starting point is the following question: is there a small number of 
“basic” \-terms such that every A-term can be obtained (up to $-equivalence) 
by applying the basic A-terms one to the other. This would mean that all the 
abstractions we need in A-terms can be generated from those contained in the 
basic A-terms. It turns out that we only need three basic A-terms, which encode 
some possible manipulations of variables: 


— |= Aa.x corresponds to using a variable, 
— § = Axyz.(xz)(yz) corresponds to duplicating a variable, 
—~ K= Axy.x corresponds to erasing a variable. 


It can be observed that the last abstracted variable of those terms is used, 
duplicated and erased respectively. As surprising as it seems as first, we can 
actually obtain any A-term by application of those only. For instance, the term 
Ary.yy can be obtained as K ((S1) 1): you can check that 


K((SI) 1) >, Axcy-yy 


Moreover, we can give the G-reduction rules directly for the basic terms: given 
any terms t, u, and v, we have 


lt —+gt Stuv —+¢ (tv)(uv) Ktu—gt 


These are the only possible reductions for terms made of basic terms, and we 
have thus described (-reduction without using variables. This motivates the 
study, in the following of terms constructed from those, with the above rules as 
reduction. 
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(*x Lambda terms. *) 
type term = 
| Var of int 
| App of term * term 
| Abs of term 


(xx Lift a term at 1. *) 
let rec lift 1 = function 


| Var i -> if i <1 then Var i else Var (i + 1) 
| App (t, u) -> App (lift 1 t, lift 1 uw) 
| Abs t -> Abs (lift (1+1) t) 


(«x Unlift a variable i at l. *) 
let unlift 1 i = 

assert (1 <> i); 

if i <1 then i else i-1 


(*x Substitute variable i for u in t. *) 
let rec sub i u = function 


| Var j -> if j = i then u else Var (unlift i j) 
| App (t, t') -> App (sub i ut, sub i u t') 
| Abs t -> Abs (sub (i+1) (lift @ u) t) 


(** Call-by-value reduction. *) 
let rec reduce = function 
| Var i -> Var i 
| Abs t -> Abs t 
| App (t, u) -> 
match reduce t with 
| Abs t' -> sub @ (reduce u) t' 
| t -> App (t, reduce u) 


Figure 3.2: Normalization of A-term using de Bruijn indices. 
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Definition. The terms of combinatory logic are generated by variables, appli- 
cation and the three above constants. Formally, they are generated by the 
grammar 


T,U:=a|TU|S|K]I 


where z is a variable, JT’ and U are terms in combinatory logic and S, K and | 
are constants. The reduction rules are 


STUV —(TV)(UV) gar f= sr 
T—T' U—U' 
TU—>T'U TUSTU 


We implicitly bracket application on the left, ie. TU V is read as (TU)V. As 
usual, we write —+ for the reflexive and transitive closure of the relation —>, 
and <> for it reflexive, symmetric and transitive closure. A normal form is 
a term which does not reduce. We write FV(T) for the set of variables of a 
term T. A combinator is a term without variables. 


Implementation. In OCaml, the terms of combinatory logic can be described by 
the type 


type term = 
| Var of var 
| App of term * term 
| S| K  |I 


The leftmost outermost reduction strategy (see section 3.5.1) can be shown to 
be normalizing: if a term admits a normal form then this strategy will reach it 
(we will see in theorem 3.6.3.3 that this normal form is necessarily unique). In 
OCaml, it can be implemented as follows: 


let rec normalize t = 

match t with 

| Var _ | S| K|I->t 

| App (t, v) -> 
match normalize t with 
| I->v 
| App (K, t) -> normalize t 
| App (App (S, t), u) -> 

normalize (App (App (t, v), App (u, v))) 

| t -> App (t, normalize v) 


An alternative, more elegant and efficient, implementation of this normalization 
procedure can be achieved by taking an additional argument env, which is a list 
of arguments the current term is applied to, which is sometimes called Krivine’s 
trick: compared to the previous implementation, we avoid normalizing multiple 
times the same term. 


let rec normalize t env = 
match t, env with 
| App (t, u), _ -> normalize t (u::env) 
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I, t::env -> normalize t env 

K, t::u::env -> normalize t env 

S, t::u::v::env -> normalize t (v::(App(u,v))::env) 

t, env -> (* apply to normalized arguments *) 
List.fold_left (fun t u -> App (t, normalize u [])) t env 


Example 3.6.3.1. Consider the combinator SKK. It satisfies, for any term T, 
SKKT —KT (KT) ->T 


Therefore the combinatory | is superfluous in our system, since it can be imple- 
mented as 
1=SKK 


which means that we could have restricted ourselves to the two combinators S 
and K only. 


Example 3.6.3.2. The term (S11) (S11) leads to an infinite sequence of reductions: 
(SIN) (S11) —> (SIN) (I(ST)) > (SIND (STI)  --- 
Theorem 3.6.3.3. The reduction relation —+ is confluent. 


Proof. This can be shown as in section 3.4, by introducing a notion of parallel 
reduction and showing that it has the diamond property. 


Abstraction. We can simulate abstractions in combinatory logic as follows. Given 
a variable x and a term T, we define a new term Az.T by 


Au.x =| 
Ag.T=KT ifa ¢ FV(T), 
Aa.(TU) =S (Aa.T) (Av.U) otherwise. 


Example 3.6.3.4. We have 
Aa.Ay.« = S(KK)|I 


Note that the term on the right is a normal form (in particular, it does not 
reduce to kK). 


Given terms T, U, we write T[U/x] for the term T where the variable x has been 
replaced by U. 


Lemma 3.6.3.5. For any terms T,U and variable x, we have 


(Az.T)U —> T[U/a] 


Proof. By induction on T. 


Translation. We can now define translations between A-terms and combinatory 
terms: 


— we translate a \-term ¢ as a combinatory term [¢]Ja, 


— we translate a combinatory term T as a A-term [T]). 
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These transformations are defined inductively as follows: 


[zt]a = 2 tw], =2 
[tua = [Ja lua [TU] = [7] [Ua 
[Av-t]a = Av.[t]a SJa = Aryz.(xz) (yz) 
K]a = Ary.x 
(li, = Aux 
Example 3.6.3.6. For instance, we have the following translations of \-terms: 
[Azy.z]a =S(KK)| Ary-yyja = K (SI) 


and of combinatory term: 
[S (KK) I], = (Aryz.(xz)(yz)) ((Ary.x)(Ary.x))(Av.2) 


Lemma 3.6.3.7. For any terms T,U, if T —+ U then [T], —+e [U]p. 


Proof. By induction on T. 


The reduction of combinatory terms can be simulated in A-calculus: 
Lemma 3.6.3.8. For any term T, [Az.T], —+g Az.[T]). 


Proof. By induction on T. 


Translating a A-term back and forth has no effect up to 6-equivalence: 
Lemma 3.6.3.9. For any A-term t, [[t]ally >, t. 


Proof. By induction on t, using previous lemma in the case of abstraction. 


The previous theorem, together with lemma 3.6.3.7, can be seen as the fact that 
combinatory logic embeds into A-calculus (modulo (-reduction). It also implies 
that the basic combinators S, K and | can be thought of as a “basis” from which 
all the \-terms can be generated: 


Corollary 3.6.3.10. Every closed A-term is 3-equivalent to one obtained from S, 
K and | by application. 


This correspondence between A-calculus and combinatory logic unfortunately 
has a number of minor defects. First, it is not true that, for A-terms ¢ and u, 
t—gu implies [la > [ua 
For instance, we have 


[Ax.(Ay-y) z]a = S(KI)I [Av.z]a = | (3.4) 


where both combinatory terms are normal forms. If we try to go through the 
induction, the problem comes from the fact that 6-reduction satisfies the rule 
on the left below, often called (€), whereas the corresponding principle on the 
right is not valid in combinatory logic: 


t,t’ T—T' 


Ax.t —>g Ax.t! (6) Aaw.T — Ax.T" 
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as the above example illustrates. Intuitively, this is due to the fact that we have 
not yet provided enough arguments to the terms. Namely, if we apply both 
terms of (3.4) to an arbitrary term T, we obtain the same result: 


S(KI) IT — KIT(IT) > 17) — IT ST and IT + T 
In general, it can be shown that 
t —+~ uU implies (tlaTi... Tn =—5 (ulaTi... Tn 


for every terms T;, provided that n is a large enough natural number depending 
on ¢t and wu. It is also not true that the translation of a combinatory term in 
normal form is a normal A-term: 


[Kaa = Ary.2) & —g AY. 2 


Again, the term K z is intuitively a normal form only because it is not applied 
to enough arguments. Finally, given a term T, the terms [[T],]. and T are 
not convertible in general. For instance 


[[K]xJa = [Ary.c]a =S(KK)IAK 


Both terms are normal forms and if they were convertible, they would reduce to 
a common term by theorem 3.6.3.3. This is again due to the lack of arguments: 
for every term T,, we have 


S(KK)IT —>+ KT 


Two combinatory terms T and T” are extensionally equivalent, when for every 
term U, we have TU <> T’ U. It can be shown that combinatory terms modulo 
reduction and extensional equivalence are in bijection with -terms modulo 8 
and 7, via the translations we have defined. 


Iota. We have seen in example 3.6.3.1 that the combinatory | is superfluous, 
so that the two combinators S and K are sufficient. Can we remove another 
combinator? With S and K we cannot. We can however come up with one 
combinator which subsumes both S and K: if we define the A-term 


L= Ax.4SK 
we have 
pee K = (2 (v2)) S =w(e(e(e2))) 


We can therefore base combinatory logic on the only combinator 1, the reduction 
rule being 
tT — TSK=T (e(e(e(ee)))) (ee (ee))) 


In the sense described above, any A-term can thus be encoded as a combinator 
based on 4, i.e. as a term generated by the grammar 


thun=eltu 
Any A-term ¢ can thus be encoded as a binary word [¢] defined by 
[=1 [tu] = O[¢][u] 
so that e(v(ee)) is encoded as 0101011. 


CHAPTER 4 


Simply typed A-calculus 


If A-calculus introduced in chapter 3 can be seen as the functional core of a 
programming language, the simply typed A-calculus studied in this chapter is 
the core of a typed programming language. It will allow us to give a formal 
meaning to the title of the book: we will see that a type can be seen as a 
formula and a typable \-term corresponds precisely to a proof of its type. This 
is the so-called Curry-Howard correspondence which is at the heart of this course. 
From a historical point of view, this calculus was introduced by Church in the 
40s [Chu40], in order to provide a foundation of logics and mathematics. Good 
further reading material on the subject include [Pie02, SU06]. 

We introduce types for A-calculus in section 4.1, show that typable terms are 
terminating in section 4.2, extend typing to types constructors other than arrows 
in section 4.3, discuss the variant where abstracted variables are not typed in 
section 4.4, discuss the relationship between Hilbert calculus and combinators 
in section 4.5, and finally present extensions to classical logic in section 4.6. 


4.1 Typing 


4.1.1 Types. A simple type is an expression made of variables and arrows. 
Those are generated by the grammar 


A,B:=X|A>B 
A simple type is thus either 
— a type variable X, 


— an arrow type A > B read as the type of functions from A to B (which 
are themselves simple types). 


By convention arrows are implicitly bracketed on the right: A > B — C is read 
as A> (BC). 


4.1.2 Contexts. A context 
DS gi Aly pan An 


is a list of pairs consisting of a variable x; (in the sense of »-calculus, see sec- 
tion 3.1.1) and a type A;. A context is thus either the empty context or of 
the form [,z: A for some context I, which is useful to reason by induction on 
contexts. The domain dom(T) of the context T is the set of variables occurring 
in it: 
dom(T) = {21,...,@n} 

Given a variable x € dom(T’), we sometimes write I'(x) for the type associated 
with it. Here, we do not require that in a context all the variables x; are 
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distinct: to be precise I(x) is the rightmost pair x : A occurring in I’, which 
can be defined by induction by 


(T,a:A)(x4) =A (T,y: A)(x) =T (a) 
for y # x. 


4.1.3 \-terms. We are going to consider a small variation of A-terms: we sup- 
pose that all A-abstractions specify the type of the abstracted variable. The 
syntax for terms is thus 

tus=a|tul|dActt 


where z is a variable, t and u are terms, and A is a type. An abstraction \x4.t 
should be read as a function taking an argument x of type A and returning t. 


Church vs Curry style for A-terms. The above convention, where abstractions 
are typed, is called Church style A-terms. We will see that adopting it greatly 
simplifies the questions one is usually interested in for those terms (such as type 
checking, see section 4.1.6), at the cost of requiring small annotations from the 
user (the type of the abstractions). 

A variant of the theory where abstractions are not typed can also be devel- 
oped and is called Curry style, see section 4.4. This is for instance the convention 
used in OCaml: one would typically write 


let f = fun x -> x 
although the Church style is also supported, i.e. we can also write 


let f = fun (x:int) -> x 


4.1.4 Typing. A sequent is a triple written as 
TFt:A (4.1) 


consisting of context [, a \-term ¢ and a type A. A term t has type A ina 
context [ when the sequent (4.1) is derivable using the three rules of figure 4.1 
where, in the rule (ax), we suppose x € dom(I) satisfied as a side condition. 
Those rules can be read as follows: 


— (ax): in an environment where wz is of type A, we know that z is of type A, 


— (+): if, supposing x is of type A, t is of type B, then the function Azx.t 
which to x associates t is of type A > B, 


— (4k): given a function t of type A > B and an argument u of type A, 
the result of the application tu is of type B. 


We simply say that the term t has type A if it is so in the empty context. A 
derivation in this system is sometimes called a typing derivation. A term t is 
typable when it has some type A in some context I. 
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Tea:T(2) i) 


T,x:Att:B 
— 
ThAr4t:A>B 


1) 


Trt:A-B Thku:A 
(>£) 
TrFtu:B 


Figure 4.1: Typing rules of simply-typed A-calculus 


Example 4.1.4.1. The term 
AFA? do f( fa) 


has type 
(A> A)A> AOA 


Namely, we have the typing derivation 


hua eee 

ae ae ee Tr fa:A ve) 
f:A>Az:AFf(fz):A ee) 
f:A—> APA flfz): AoA ye 


PAP ef fa (Aas A) SAS A 


with 
T=f:A7>A,z:A 


Remark 4.1.4.2. Although this will mostly remain implicit in the following, we 
consider sequents up to a-conversion: this means that, in a sequent TF t: A, 
we can change a variable x into y both in I and in ¢ at the same time, provided 
that y ¢ dom(I). Because of this, we can always assume that all the variables 
are distinct in the contexts we consider. This assumption is sometimes useful 
to reason about proofs, e.g. with this convention, the axiom rule is equivalent 


to 
(ax) 


T,x: A,’ ba:A 


We do however feel bad about systematically assuming this because, in practice, 
implementations of logical or typing systems do not maintain this invariant. 


4.1.5 Basic properties of the typing system. We state here some basic 
properties of the typing system, which will be used later on. First, the following 
variant of the structural rules (see section 2.2.10) hold. 
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Lemma 4.1.5.1 (Weakening rule). The weakening rule is admissible 


rjI’+t:B 
T,v: A,’ Ft:B 


(wk) 


provided that « ¢ dom(T). 


Proof. By induction on the derivation of [,I’ | t: B. The case of axiom rule 
uses the fact that we can suppose that « ¢ dom(TI), since we are considering 
sequents up to a-conversion, see remark 4.1.4.2. 


Lemma 4.1.5.2 (Exchange rule). The exchange rule is admissible 


T,a: Ay: BI’ Ft:C 
Ty: B,2: AI’ rt: C 


(xch) 


provided that x F y. 


Proof. By induction on the derivation of the premise. 


Lemma 4.1.5.3 (Contraction rule). The contraction rule is admissible: 


T,z:A,y: AI’ Ft: B 
Pea AR eg) eB 


(contr) 


Proof. By induction on the derivation of the premise. 


All the free variables of a typable term are bound in the context: 
Lemma 4.1.5.4. Given a sequent [+ t : A which is derivable, we have FV(t) C dom(T). 


Proof. By induction on the derivation of the sequent. 


In particular, a term t typable in the empty context is necessarily closed, 
i.e. FV(t) = @. Conversely, a variable which does not occur in the term can 
be removed: 


Lemma 4.1.5.5. Given a derivable sequent [,2: A,I’ + t: A with x ¢ FV(t), 
the sequent [,I’ + t: A is also derivable. 


Proof. By induction on the derivation of the sequent. 


4.1.6 Type checking, type inference and typability. The three most im- 
portant algorithmic questions when considering a typing system are the follow- 
ing ones. 


— The type checking problem consists, given a context [, a term ¢ and a 
type A, in deciding whether t has type A in context I. 


— The type inference problem consists, given a context I and a term t which 
is typable in the context [, in finding a type A such that t has type A in 
context T. 


— The typability problem consists, given a context I and a term ¢, in deciding 
whether ¢ admits a type in this context. 
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In simply-typed A-calculus all those three problems are very easy: they can 
be answered in linear time over the size of the term ¢ (neglecting the size of I): 


Theorem 4.1.6.1 (Uniqueness of typing). Given a context I and a term t there 
is at most one type A such that t has type A in the context [ and at most one 
derivation of TF t: A. 


Proof. By induction on the term t. We have the following cases depending on 
its shape: 


— if the term is of the form « then it is typable iff « € dom(T) and in this 
case the typing derivation is 


Trea 


with A=T[(z), 


— if the term is of the form tu then it is typable iff both t and wu are typable 
in T, with respective types of the form A — B and A, and in this case the 
typing derivation is 


TrFt:A-B Thku:A 
TeFtu:B 


(8) 


— if the term is of the form \w4.t then it is typable iff t is typable in con- 
text [,z : A with some type B, and in this case the typing derivation 
is 


T,cz:AtFB 
ThkArt.t:43B 


(+1) 


This concludes the proof. 


The above theorem allows one to speak of “the” type and “the” typing derivation 
of a typable term. Moreover, its proof is constructive, in the sense that it allows 
to explicitly construct the type of a term when it exists (ie. perform type 
inference) and determine that the type admits no type otherwise (i.e. perform 
typability), by induction on the type of the term. Since a term admits a unique 
type, the type checking problem can be reduced to type inference: in a given 
context, a term t admits a type A if and only if the type inferred for t is A. 


Implementation. An implementation is provided in figure 4.2. 


— The function infer infers the type of a given term in a given context env, 
which is a list of pairs consisting of a variable and a type, encoding the 
typing context I’ (in reverse order). Depending on whether the term is 
a variable, an abstraction or an application, the function will recursively 
look for proofs, using the rules (ax), (41) and (—) respectively. The 
function raises the exception Not_found when no such type exists. 
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— The function check performs type checking: given an environment env, a 
term t and a type a, it returns () if the term admits the given type and 
raises Not_found otherwise. The implementation of this function corre- 
sponds to the proof of theorem 4.1.6.1. 


— The function typable determines whether a terms admits a type or not 
in a given environment. 


4.1.7 The Curry-Howard correspondence. The presentation and naming 
of the rules of section 4.1.4 is intended to make it clear the relation with logic: 
if we erase the term annotations and replace — by =, we obtain precisely the 
rules of the implicational fragment of intuitionistic logic, see section 2.2.6. This 
parallel between the typing rules (on the left) and the rules in natural deduction 
(on the right) is shown in the table below: 


Dextre Tarea”” 
T,cx:Att:B a T,AFB 
> ne SE So 2 
TPiaett:A5B rease ! 
Trt:A-B Thku:A TFASB TFA 
TRtu:B =) TER ie) 


If we start from a typing derivation, we obtain a derivation in NJ by erasing 
the terms, i.e. replacing a rule on the left column above by the corresponding 
rule on the right column: this process is called here the term erasing proce- 
dure. Abstractions thus correspond to introduction rules of >, applications 
to elimination rules of =, and variables to axiom rules. In fact, the relation- 
ship between typable terms and proofs in NJ is very tight: this is known as the 
Curry-Howard correspondence, also called proofs-as-programs correspondence or 
propositions-as-types correspondence. It was first explicitly stated by Howard 
in notes which circulated starting from 1969 and were ultimately published a 
decade later [How80]. The name of Curry is due to his closely related discovery 
of the correspondence between Hilbert calculus and combinatory logic [CF58] 
in the late 50s, detailed in section 4.5. We recall that natural deduction [Gen35] 
and simply typed A-calculus [Chu40] were introduced in 1935 and 1940: this 
correspondence might look like an obvious fact once the concepts are properly 
elaborated and the right notations set up, but it took 30 years to get there. 


Theorem 4.1.7.1 (Curry-Howard correspondence). Given a context [ and a 
type A, the term erasing procedure induces a one-to-one correspondence be- 
tween 


(i) A-terms of type A in the context T, and 
(ii) proofs in the implicational fragment of NJ of PF A. 


Proof. Suppose given a proof 7 of a sequent. We construct a term t having this 
proof as typing derivation by induction on the derivation: 
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type var = string 


(** Types. *) 

type ty = 
| TVar of string 
| Arr of ty * ty 


(xx Terms. *) 
type term = 
| Var of var 
| App of term * term 
| Abs of var * ty * term 


exception Type_error 
(** Type inference. *) 


let rec infer env = function 
| Var x -> 


(try List.assoc x env with Not_found -> raise Type_error) 


| Abs (x, a, t) -> 
Arr (a, infer ((x,a)::env) t) 
| App (t, u) -> 
match infer env t with 
| Arr (a, b) -> check env u a; b 
| _ -> raise Type_error 


(** Type checking. *) 
and check env t a = 
if infer env t <> a then raise Type_error 


(** Typability. *) 
let typable env t = 
try let _ = infer env t in true 


with Type_error -> false 


Figure 4.2: Type checking, type inference and typability. 


165 
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— if the proof is of the form 


TArea™) 


then necessarily the corresponding typing derivation is 


(ax) 


T,2z:A,’ba:A 


— if the proof is of the form 


T T 
TFA=SB [TFA 
TFB 


(+8) 


then by induction hypothesis we have a terms t and u with typing deriva- 
tions 


TKt:A-B Tru:A 


and necessarily the typing derivation is 


— if the proof is of the form 


Tv 
T,AFB 


TrAS BOD 


then by induction hypothesis we have a typing derivation 


T,2@:AbFt:B 


and necessarily the typing derivation is of the form 


T,2:AFt:B 


> 
Peewee k 1) 


Conversely, given a term of type A in the context T, theorem 4.1.6.1 ensures 
that there is at most one type derivation for it, and erasing it provides a proof 
of [+ A. Finally, it is easily shown that both translations establish a bijective 
correspondence. 


In the light of the previous theorem, typable A-terms can be thought of as 
witnesses for proofs. 
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Remark 4.1.7.2 (Contexts as sets). The two \-terms \r4.\y4.2 and Ax4.\y4.y 
both have the type A— A— A: 


eid: Abaca” 
> 
z: Ab dy4.a:A>A 

b AeA y4.2: A> AOA 


(ax) 
(+1) 
(+1) 


z:Ajy:Ary:A 
t:AkrAy4.y: A> A 
b Att rAy4A.y: ADAG A 


1) 
(+1) 


and they are clearly different (they respectively correspond to the first and the 
second projection). This sheds a new light on our remark of section 2.2.10, 
stating that contexts should be lists and not sets in proof systems. If we han- 
dled them as sets, we would not be able to distinguish them since both would 
correspond, via the “Curry-Howard correspondence”, to the proof 


(ax) 


(=1) 


AFA 
AFASA 
FASA=S>A 


I 


In other words, it is important, in axiom rules, to know exactly which hypothesis 
we are using in the context when there are two of the same type. 


Remark 4.1.7.3 (Equivalence vs isomorphism). In the same vein as previous re- 
mark, there is a difference between equivalence and isomorphism in type theory. 
For instance, we have an equivalence 


(A> A=>B)s(A=>B) 


but the types A > A => Band A = B are not isomorphic. The equivalence 
amounts to having terms corresponding to both implications of the equivalence: 


t:(A> A- B)> (A> B) u:(A> B)-> (A> A-B) 
Here, we can take 
ba fh rn he fae u=dfA7? rat ry“. f x 


Such a pair of terms is an isomorphism when both composites are (87-equivalent 
to) the identity: 
pW cmacaee (u f) ie ST eee g 


In the above example, the first equality does hold, but not the second since 


Gamat (t f) = DY cemcaees Creel ee pn OY cameemaese 3 


4.1.8 Subject reduction. An important property, relating typing and {-re- 
duction in the A-calculus is the subject reduction property, already encountered 
in theorem 1.4.3.2: typing does not change during evaluation, by which we mean 
here 6-reduction. We first need an auxiliary lemma: 
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Lemma 4.1.8.1 (Substitution lemma). Suppose that we have a typing derivation 
of 
T,x:A,’Ft:B and TM’ bu:A 


then we have a typing derivation of TI’ t[u/z] : B. In other words, the rule 


T,2:A,I’tt:B TM bu:A 
Tr,’ + tlu/z]: B 


is admissible. 
Proof. By induction on the typing derivation of [,2: A,I’  ¢: B. 


— If it is of the form 


T,v:A,”ba:A i) 


then we conclude with the derivation of [,I’ + wu: A in the hypothesis. 


— If it is of the form 


T,cz:AI’by:B ve) 


where « #£ y and y: B occurs inT or I’, then we conclude with 


ire, 


— If it is of the form 
Ty 72 
Tjz:AI’Ft:Bsc Tz: A’+t:B 
T,2z: A,r tt’: C 


then we conclude with 
TT TT 
Lj’ + ¢u/e]: BSC Pet Al Pt l/l 
TI’ (t{u/2z]) (t'[u/a]) : C 


(=n) 


where 7 and 74 are respectively obtained from 7, and 72 by induction 
hypothesis. 


— If it is of the form 


T 
T,cz#:A,j’,y:BrEt:C 
T,c: A,’ bd\yt: B>C 


I 


then we conclude with 


/ 
T 


CoP eB Riles ec: 
L,I’ + dAy.t[u/z]: BSC 


(=1) 
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where 7’ is obtained from 


T 4 Tj” bu:A - 
T,2z@:A,j’,y:Brt:C as ae here ae 


by induction hypothesis. 


Remark 4.1.8.2. Note that, through the Curry-Howard correspondence, the sub- 
stitution lemma precisely corresponds to the “proof substitution” of proposi- 
tion 2.3.2.1: the term erasure of the rule of lemma 4.1.8.1 is the cut rule 


TAMEB TI’tA 
r’eB 


(cut) 


It should not be a surprise: under the Curry-Howard correspondence, substi- 
tuting proofs corresponds to substituting terms. 

Theorem 4.1.8.3 (Subject reduction). Suppose given a term t of type A ina 
context I’. If t 6-reduces to t’ then t’ also has type A in the context I. 


Proof. By induction on the derivation of t —+, t’ (see section 3.2.1). 
— If the derivation ends with (65), it is of the form 
(Az.t)u —+, t[u/a] 


and the typing derivation of the term on the left is of the form 


T,2:AFt:B 


TrFArvt:A-B *1) Tru:B 
Tr (\z.t)u: B oe 


We conclude by lemma 4.1.8.1 which ensures the existence of a derivation 
of the form 


Te ula :B 


— If the derivation ends with (4), it is of the form 
tu—>gt'u 
with t —>, t’, and the typing derivation of the term on the left is of the 
form 
ust 12 
TFt:A>+B T,a:AbFu:B 
TFtu:B *=) 
We conclude with the derivation 
Ty 2 
Trt :AsB T,a:Abru:B 
; (>) 
Trtu:B 


where 7, is obtained by induction hypothesis. 
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— The cases of (G,) and (8) are similar to the previous one. 


Example 4.1.8.4. We have the typing derivation 


vu: A,y:Ary:A oa 
RA Aa oy (are ie 
a: Ak (Ay4.y)a:A 
b vt“. Oy4.y)a: A> A 


and the reduction 
dat. Oy’ ya pg Att. 


It can be checked that the reduced term does admit the same type A > A: 


aia 
A (> 
Fre .2: AoA 


1) 


The proof of the above theorem deserves some attention. It should be observed 
that, by erasing the terms, the 6-reduction of a typable term described in the 
above proof corresponds precisely to the procedure we used in section 2.3.3 in 
order to eliminate a cut in the corresponding proof: 


Thus, 


Theorem 4.1.8.5 (Dynamical Curry-Howard correspondence). Through the Curry- 
Howard correspondence, (6-reduction corresponds to eliminating cuts. 


This explains the remark already made in section 2.3.3: although cut-free proofs 
are “simpler” in the sense that they do not contain cuts, they can be much 
bigger than the corresponding proofs with cuts, in a same way that executing 
a program can give rise to a much bigger result than the program itself (e.g. a 
program computing the factorial of 1000). As a direct consequence of previous 
theorem, we have that 


Corollary 4.1.8.6. Through the Curry-Howard correspondence, typable terms 
in normal form correspond to cut-free proofs. 


4.1.9 n-expansion. We have seen that a G-reduction step corresponds to elim- 
inating a cut, which consists of an introduction rule followed by an elimination 
rule, when reading the proof from top to bottom. Similarly, an 7-expansion 
step corresponds to introducing a “co-cut” (we are not aware of an official name 
for those) consisting of an elimination rule followed by an introduction rule. 
For instance, supposing that in some context [ we can show t: A — B, the 
n-expansion step 
t—y de tax 
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corresponds to the following transformation of typing derivation 


Trt:A-B 


k ea 
foot eaoe ery ew ie 
: T,2@:Abtta:B ) 
TFt:A+B * TeActtxz:A5B eo 


which, after term erasure, corresponds to the following proof transformation 


T 
TFASB 
pies rn) 
T,AFA=B T,AFA 
7 T,AFB (8) 
TFAS>B * TFAS>B (1) 


4.1.10 Confluence. Recall from section 3.4 that 6-reduction of A-terms is con- 
fluent. By theorem 4.1.8.3, we can immediately extend this result to typable 
terms: 


Theorem 4.1.10.1 (Confluence). The $-reduction is confluent on typable terms 
(in some fixed context): given typable terms t, ui and uz such that t — Uy 
and t —+¢ Ug, there exists a typable term v such that u, — + v and ug a v. 


4.2 Strong normalization 


4.2.1 A normalization strategy. We have seen in theorem 4.1.8.5 that, under 
the Curry-Howard correspondence, (-reduction corresponds to cut elimination. 
Since, in theorem 2.3.3.1, we have established that every proof reduces to a 
cut-free proof, this means that every typable term (-reduces to a term in nor- 
mal form. More precisely, the proof produces a strategy to reduce a term to a 
normal form: we can reduce a (-redex (Ax.t) u whenever t and u do not contain 
6-redexes. In fact, the proof only depends on the hypothesis that u does not 
contain $-redexes, and we have to suppose this because those redexes could be 
“duplicated” during the reduction, making it unclear that it will terminate. For 
instance, writing | = Av.x for the identity, with t = Ay.yxx and u = II, we have 
the following reductions: 


(Ary.yxx) (11) > (Ary.yxx) | 


| | 


Ay-y(I 1) (11) ——> dAy-yl (1) ——> dAy-yll 


We see that the redex || —+g | on the top line has become two redexes in the 
bottom line: this is because the term Axy.yxx contains the variable « twice and 
the vertical reduction will thus cause the term substituted for x to be duplicated. 
Following the terminology introduced in section 3.5.1, what theorem 2.3.3.1 
establishes is thus that the innermost reduction strategies, such as call-by-value, 
terminate for typable A-terms. 
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4.2.2 Strong normalization. We would now like to show a stronger result 
called strong normalization: every typable term is strongly normalizing. This 
mean that, starting from a given typable term t, we will always end up with a 
normal form after a finite number of steps, whichever way we chose to reduce 
it, see section 3.2.6. We show below a proof based on “reducibility candidates” 
which is due to Tait [Tai75] and later refined by Girard, see [Gir89, Chapter 6]. 
Before entering the details of this subtle proof, let us first explain why the naive 
ideas for a proof do not work. 


Failure of the native proof. A first attempt to show the result would consist 
in showing that, for any derivable sequent [+ t : A, the term ¢ is strongly 
normalizing by induction on the derivation of the sequent. 


— For the rule (ax), this is obvious since a variable is strongly normalizing 
(it is even a normal form). 


— For the rule (>), we have to show that a term Ax.t is strongly normal- 
izing knowing that t is strongly normalizing. A sequence of reductions 
starting from Az.t is of the form Axv.t —+g Ax.t; —+g Ax.t2 —+¢ ... with 
t —+g ti —+g te —+z ..., and is thus finite since ¢ is strongly normalizing 
by induction hypothesis. 


— For the rule (> ), we have to show that a term ¢ u is strongly normalizing 
knowing that both ¢ and wu are strongly normalizing. However, a reduction 
in tu is not necessarily generated by a reduction in ¢ or in u in the case 
where ¢ is an abstraction, and we cannot conclude. 


If we try to identify the cause of the failure, we see that we do not really use the 
fact that the terms are typable in the last case. We are left proving that if t and 
u are normalizable then tu is normalizable, and there is a counter-example to 
that, already encountered in section 3.2.6: take t = Axv.ax and u = Az.xx, both 
are strongly normalizable, but tu is not since it leads to an infinite sequence of 
reductions. This however is not a counter-example to the strong normalizability 
property, because Ax.xxz cannot be typed, but we have no easy way of exploiting 
this fact. 


Reducibility candidates. Instead, we now take an “optimistic” approach and, 
given a type A, we define a set R4 of terms, called the reducibility candidates 
at A, which are terms such that 


(i) for every term t such that [| t: A is derivable, we have t € Ra, 
(ii) a term t in Ry is “obviously” strongly normalizing. 
Which will allow us to immediately conclude, once we have shown those prop- 
erties. 
The definition is performed by induction on the type A by 


— for a type variable X, Rx is the set of all strongly normalizable terms f, 


— for an arrow type A > B, Ra_,p is the set of terms t such that for every 
u € Ra, we have tu € Rp. 
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In the first case, we have not been particularly subtle: we wanted a set of 
strongly normalizable terms which contains all the terms of type X, and we 
simply took all strongly normalizable terms. However, in the second case, we 
have crafted our definition to avoid the previous problem: in the case of the 
tule (+), it will be obvious how to deduce, given t € R4a+4p and u € Ra, 
that tu € Rg. However, it is not obvious that every term in R4_,p is strongly 
normalizing and we will have to prove that. A term is said to be reducible when 
it belongs to a set of reducibility candidates R4 for some type A. 

We begin by showing that every term t € Ry is strongly normalizing by 
induction on the type A, but in order to do so we need to strengthen the 
induction hypothesis and show together additional properties on A. A term is 
neutral when it is not an abstraction; in other words, a neutral term is of the 
form tu or 2. 


Proposition 4.2.2.1. Given a type A and a term t, we have 
(CR1) if t € Ry then t is strongly normalizing, 

(CR2) ift € Ra and t —+g?’ then t’ € Ra, 

(CR3) if t is neutral, and t —+, t’ implies t’ € Ry, then t € Ra. 


Proof. Consider a term t. We show simultaneously the three properties by 
induction on A. In the base case, the type A is a type variable X. 


(CR1) If t € Rx then it is strongly normalizable by definition of Rx. 


(CR2) Suppose that t € Rx (ie. t is strongly normalizing) and t —+, t’. Every 
sequence of reductions t/ —+, ... starting from t’ can be extended as 
a sequence of reductions t —+, t/ —>g ... starting from t, and is thus 
finite. Therefore t’ is strongly normalizing and thus belongs to Rx. 


(CR3) Suppose that ¢ is neutral and such that for every term ¢’ such that 
t —+g tU' we have t’ € Rx. A sequence of reductions t —>, t’ —+,... 
starting from ¢ is such that t’ € Rx, and is thus finite. Therefore t € Rx. 


Consider the case of an arrow type A > B. 


(CR1) Suppose that t € R4_,p, ie. for every u € Ra we have tu € Rp. A 
variable x is neutral and a normal form and thus belongs to R4 by (CR3). 
By definition of Rap, we have tx € Rg. Any sequence of reductions 
t —+, t! —+g ... induces a sequence of reductions tx —+, t/x —+,... 
and is thus finite by (CR1) on B. Thus t is strongly normalizing. 


(CR2) Suppose that t € Rag and t —>, ¢t’. Given a term u € Ra, by 
definition of Rap, we have tu € Rg. Since tu —+, tu, by (CR2) 
on B, we have t/u € Rg. Therefore t’ € R4_,p. 


(CR3) Suppose that t is neutral and such that, for every term t’ with t —>+, t’, 
we have t’ € R4_,p. Suppose given a term u € Ry. By (CR1) on A, the 
term u is strongly normalizing and we can show that tu € Rg for every 
term u € Ry by well-founded induction on u (theorem A.3.2.1). Since t 
is neutral, the term tu can only reduce in two ways. 
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— If tu —+z, t’u then t'u € Rg because, by hypothesis, we have 
te Rasp. 

—Iftu —, tu’ with u —>, wu’ then u’ € Ra by (CR2) on A and, 
by induction hypothesis on u, we have tu’ € Rp. 


Therefore, by (CR3) on B, we have tu € Rg. We conclude that 
te Ra +p. 


We now hope to be able to show that for every derivable sequent [| t: A, 
we have t € Ra, by induction on A. The case (ax) is easily handled (we 
have seen in the previous proof that variables belong to all sets R4) and the 
case (>) is immediate by definition of Rag. However, the case of (1) 
does not got through: from the hypothesis t € Rg, we would need to deduce 
that Ax.t € Rap, ie. that (Av.t)u € Rp for every u € Ry. Since we have 
(Az.t) u—+ tlu/z], this suggests proving by induction that t[u/z] € R4 instead 
of t € Ra (which is a particular case since t = t[x/]) or, even more generally, 
lemma 4.2.2.3 below. We begin by the following lemma, which is used in its 
proof. 


Lemma 4.2.2.2. Suppose given a term t such that t[u/x] € Rg for every term u € Ra. 
Then \x4.t € Rap. 


Proof. We have seen that « € R4 by (CR3) and thus t = t[a/a] belongs to Rp. 
Given u € Ra, we have to show (Ar4.t)u € Rg. By (CR1), the terms ¢ and 
u are strongly normalizing. We can thus show (Ar4.t)u € Rg by induction on 
the pair (t,u). The term (Ar4.t)u can either reduce to 


— tlu/a], which is in Rg by hypothesis, 
~ (Ar4.t')u with t —+ t’, which is in Rg by induction hypothesis, 
— (Ar“.t)u' with u —+g u', which is in Rg by induction hypothesis. 


In every case, the neutral term (Ar4.t)u reduces to a term in Rg and therefore 
belongs to Rg by (CR3). 


Lemma 4.2.2.3. Suppose given a term ¢ such that [+ t : A is derivable for some 
context [ = a: Aj,...,% : An and type A. Then, for every terms t; € Ra,, 
for 1 <i<n, we have t[t1/a1,...,tn/Un] € Ra. 


Proof. We write t[t,/2,| for the above substitution, and show the result by 
induction on t. By induction on the derivation of TF t: A. 


— If the last rule is 
Sa ee) 
then, for every terms t; € Ra,, we have t[t,/x,| = t; € Ra,. 
— If the last rule is 


Trku:A-B Thkv:A 
TrFuv:B 


+B) 


then, for every terms t; € R4,, by induction hypothesis, we have u[t,/xv.] € Ra+p 
and vu{t,./x.] € Ra, and therefore we can conclude t[{t, /xv.] = (ult./x.])(v[t./a«]) € Re. 
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— If the last rule is 
T,2:Aru:B 


TFAvu:A>B 


(+1) 


then, by induction hypothesis, for every terms t; € R4, and for every 
term v € Ry, we have ult, /x,][v/xz] = ult, /xv.,v/¢] € Rp. Therefore, by 
lemma, 4.2.2.2, we have t[t,./2.] = Av.(u[t«/v«]) € Rasp. 


Proposition 4.2.2.4 (Adequacy). Given a term t such that | t: A is derivable, 
we have t € Ry. 


Proof. We write T = x, : Aj,...,%n : An. A variable being neutral and in 
normal form, by (CR3), we have x; € Ra, for every index i. Therefore, by 
lemma 4.2.2.3, t = t[a,/as] € Ra. 


Theorem 4.2.2.5 (Strong normalization). Every typable term t is strongly nor- 
malizing. 


Proof. By proposition 4.2.2.4, the term t is reducible, and thus strongly nor- 
malizing by (CR1). 


One of the remarkable strengths of this approach is that is generalizes well to 
usual extensions of simply typed A-calculus, see section 4.3.7. 


Remark 4.2.2.6. There are many possible variants on the definition of reducibil- 
ity candidates, see [Gal89]. The version presented here has the advantage of 
being simple to define and leads to simple proofs. One of its drawbacks is that 
the A-terms of R4 are not necessarily of type A (for instance, when A = X any 
strongly normalizable term belongs to Ra by definition). We can however define 
a “typed variant” of reducibility candidates, by defining sets Rp-4, indexed by 
both a context T and a type A, by induction on A by 


— for a type variable X, Rp-x is the set of strongly normalizable terms t 
such that [  ¢t : X is derivable, 


— for an arrow type A > B, Rr-4zsep is the set of terms t such that 
[TE t: A-— B is derivable and for every u € Rp-a, we have tu € Roy p. 


The expected adaptation of the above properties hold in this context, see the 
formalization proposed in section 7.5.2. In particular, the variant of proposi- 
tion 4.2.2.4 ensures that every term t such that TF t: A is derivable belongs 
to Rpt; conversely, one easily shows by induction on A that every term t of 
Rrvia is such that [+ t: A is derivable. With this formulation, it thus turns 
out that reducibility candidates are simply a complicated way of defining 


Rrea = {t | TF t: A is derivable} 


However, the way the definition is formulated allows to perform the proofs by 
induction! 


4.2.3 First consequences. We shall now present some easy consequences of 
the strong normalization theorem. 
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Non-typable terms. A first consequence of the strong normalization theorem 4.2.2.5 
(or rather its contrapositive) is that there are terms which are not typable. For 
instance, the A-term 0 = (Av.rx)(Ax.xx) is not typable because it is not termi- 
nating, see section 3.2.6. 


Termination of cut elimination. By theorem 4.1.8.5, cut-elimination in the im- 
plicational fragment of natural deduction corresponds to G-reduction. Since 
for typable terms (-reduction is always terminating (theorem 4.2.2.5), we have 
shown 


Theorem 4.2.3.1. The cut elimination procedure of section 2.3.3 always termi- 
nates (on a cut-free proof), whichever strategy we choose to eliminate the cuts. 


4.2.4 Deciding convertibility. In practice, the most important consequence 
of the strong normalization theorem is that it provides us with an algorithm to 
decide the $-convertibility of typable \-terms t and u. Namely, suppose that 
we start reduce t: 


t= to +g ty +g tg P Bowes 


This means that we start from t, reduce it to a term t;, then reduce t; to a 
term tg, and so on. Note that we do not impose anything on the way tj+41 
is constructed from t;: any reduction strategy would be acceptable. By theo- 
rem 4.2.2.5, such a sequence cannot be infinite, which means that this process 
will eventually give rise to a term t, which cannot be reduced: t, is a normal 
form. This shows that there exists a term in normal form ¢ such that t —+, f. 
Similarly, uw admits a normal form @. Clearly, t and u are {-convertible if and 
only if t and @ are 6-convertible. By proposition 3.4.4.3, this is the case if and 
only if ¢ and @ are equal: 


We have thus reduced the problem of deciding whether two terms are convertible 
to deciding whether two terms are equal, which is easily done. Using the func- 
tions defined in section 3.5, the following function eq tests for the G-equivalence 
of two A-terms which are supposed to be typable: 


let eq t u = (normalize t) = (normalize u) 


Remark 4.2.4.1. In fact, even if we do not suppose that the terms t and u given 
as input of the above function are typable, it is still correct in the sense that 


— if it answers true then t and wu are convertible, and 
— if it answers false then ¢t and u are not convertible. 


However, there is now a third possibility: nothing guarantees that the normal- 
ization of t or u will terminate. This means that the procedure will not provide 
a result in such a case. As such it is not an algorithm, since, by convention, 
those should terminate on every input. 
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4.2.5 Weak normalization. The strong normalization theorem is indeed strong: 
it shows that, starting from a typable term, whichever way we chose to reduce 
a typable term, we will eventually end up with a normal form. In practice, 
however, we care about less than this: when implementing reduction and nor- 
malization, we implement a particular reduction strategy (see section 3.5.1), and 
all we want to know is that this particular strategy will end up with a normal 
form. 

In particular, when this reduction strategy is the call-by-value strategy, 
which is by far the most common one, a much simplified version of the above 
argument can be used to show that every closed typable term is terminating 
according to the chosen strategy, see [Pie02, Chapter 12]. In the following, we 
write t —> u to indicate that t reduces to u according to the call-by-value strat- 
egy. An important point about this strategy is that it is deterministic in the 
sense that if t —> u and t —> u’ then u = u’. Because of this, strong and weak 
normalization coincide for the strategy, and we simply speak of normalizing 
terms. 

We define sets R4 of A-terms by induction on A by 


—~teéRx if t: X is derivable and ¢ is normalizing, 


—~té Ra 4p if t: A - B is derivable, ¢ is normalizing and tu € Rg for 
every u€ Ra. 


Note that contrarily to section 4.2.2, by lemma 4.1.5.4, the sets R. contain only 
closed terms of type A: although this is not necessary, it is nice to see that can- 
didates for a type A only need to involve terms of this type. In section 4.2.2, we 
have been using the following property of reduction when showing the properties 
of reducibility candidates in proposition 4.2.2.1: 


Lemma 4.2.5.1. If t —>?t’ and t is normalizing then t’ is. 
Proof. An infinite sequence of reductions t/ —> ... starting from ?¢’ can be 


extended as one t —> t’ —> ... starting from t, i-e. if t’ not strongly normalizing 
then t is not either. We conclude by contraposition. 


A consequence of the determinism of the strategy is that the converse of the 
above lemma now also holds: 


Lemma 4.2.5.2. If t —> t/ and t’ is strongly normalizing then t is. 


Proof. By determinism, an infinite sequence of reductions starting from t is 
necessarily of the form t —> t/ —> ..., and thus induces one starting from t’. 


Remark 4.2.5.3. Again, this property would not be true with the relation —+,, 
which is not deterministic. For instance, we have (Av.y)Q —>, y, where (Ax.y)Q 
is not strongly normalizing but y is. 

We can now easily show variants of the properties of proposition 4.2.2.1. Note 
that the proof is greatly simplified because we do not need to prove them all at 
once. 


Lemma 4.2.5.4 (CR1). If t € Ry then it is strongly normalizing. 


Proof. By induction on A, immediate by definition of Ry. 


Lemma 4.2.5.5 (CR2). Ift € Ra and t —>?’ then t’ € Ry. 


CHAPTER 4. SIMPLY TYPED \-CALCULUS 178 


Proof. By induction on the type A. 


— Suppose that t € Ry. Then t’ has type A by subject reduction theo- 
rem 4.1.8.3 and ¢’ is normalizing by lemma 4.2.5.1. Thus t’ € Rx. 


— Suppose that ¢ € R4 4,5. Then t’ has type A by theorem 4.1.8.3 and 
is normalizing by lemma 4.2.5.1. Given u € Ra, we have tu € Reg by 
definition of R45, and tu — t’ u because the reduction strategy is call- 
by-value, and thus t’u € Rg by induction hypothesis. Thus t’ € Ra+p. 


The last one uses lemma 4.2.5.2 and thus relies on the fact that we have a 
deterministic reduction: 


Lemma 4.2.5.6 (CR3). If t has type A, t —> t’ and t’ € Ry thent € Ry. 
Proof. By induction on the type A. 
— Suppose that t/ € Rx. Then t € Rx by lemma 4.2.5.2. 


— Suppose that t’ € R4+4,. Then t’ is strongly normalizing, and thus also 
t by lemma 4.2.5.2. Given u € Ra, we have t'u € Rg by definition 
of R4_,p, and tu — tu because the reduction strategy is call-by-value, 
and thus tu € Rg by induction hypothesis. Thus t € Ry_,p. 


We can then show the following: 


Lemma 4.2.5.7 (Adequacy). Suppose given a term ¢ such that ! + ¢: A is 
derivable for some context [T = x, : Ay,...,2%, : An and type A. Then, for 
every terms t; € Ry,, for 1 <i <n, we have t[ty/71,...,tn/tn] € Ra. 


Proof. The result is shown by induction on the derivation of PF t: A. 


— If the last rule is 
T,a:Abu:B 


TRFArvu:A>B 


(+1) 


then, by induction hypothesis, for every terms t; € R, and for every term 
v € Ra, we have ult,/x,|[v/z] = ult./z.,v0/z] € Rp. Since v € Ra, it 
is normalizing and there is a reduction v —> 6 to some normal form 6, 
and we have 6 € R4 by (CR2). In the call-by-value reduction strategy, 
we have 

(Ax.u) v > (Av.u) 6 — uld/a] 
thus, 

(Az.ults/2«]) 0 —> ult. /r., 6/2] 


where the term on the right belongs to Rg by induction hypothesis, and 
therefore the term on the left as well by (CR3). Since this holds for any 
term v € Ra, we have shown tt, /x,] = Av.(ult./v.]) € Rasp. 


Other cases are handled as in lemma 4.2.2.3. 


Finally, we can deduce 


Theorem 4.2.5.8. Given a term t, if there is a type A such that - t: A is 
derivable then ¢ is normalizing. 
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Proof. Suppose that | t : A holds. By lemma 4.2.5.7, we have that t € Ry. 
Thus ¢ is normalizing by lemma 4.2.5.4. 


The call-by-value reduction strategy is complete in the following sense: for every 
term t, if there is a term wu such that t —+, u then there is a term wu’ such that 
t —> wu’. In other words, our strategy can reduce any (-reducible term. We 
thus deduce that 


Theorem 4.2.5.9 (Weak normalization). Every typable term is weakly normal- 
izing. 


A formalization of these properties is provided in section 7.5.2. 


4.3 Other connectives 


Up to now, for simplicity, we have been limiting ourselves to types built using 
arrows as the only connective. The Curry-Howard correspondence would be 
sad if it stopped there: it actually extends to other usual connectives. For 
instance, the product of two types corresponds to taking the conjunction of the 
two corresponding formulas. Other cases of the correspondence are given in the 
following table: 


Typing Logic 
function > |= implication 
product x | A conjunction 
unit Le? )) a: truth 
coproduct + | V_ disjunction 
empty OO} Lt falsity 


In order to study this, we will now consider types generated by the following 
syntax: 
A,B:=X|A>B|AxB|1|A+B|0 


For each of those connectives, we add a connective between types, as well as 
new constructions to the A-calculus which correspond to introduction and elim- 
ination rules, and the full syntax for A-terms will be 


tus=a|tu| Ags 
| (t,u) | m(t) | m(t) |) 


| eG) | v(t) | case(t,r > u,y> v) | case“*(t) 


Moreover, each such connective will give rise to typing rules and the full list of 
rules is given in figure 4.3. In addition, we need to add new rules for G-reduction, 
which correspond to cut elimination for the new rules (see section 4.1.8), re- 
sulting in the rules of figure 4.4, and 7-expansion rules which correspond to 
introducing “co-cuts” (see section 4.1.9). We now gradually introduce each of 
those. 

Most of the important theorems extend to the A-calculus with these new 
added constructors and types, although we will not detail these: 


— confluence (theorem 3.4.3.7), 


— subject reduction (theorem 4.1.8.3), 
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(ax) 


TF a:T(2) 
Ttt:A+B Tru:A T,2:AtFt:B ms 
Trtu:B = TrArtt:A3B 
Pitta xed o> 5 Trtt:AxB ‘) [Trt:A TrFu:B 
Theme: A) Teaw@ie PPG aoe 
1 
TrO ; fy 
Ttt:A+B T,x:Aku:C Tjy:Bru:C | 
TF case(t, 2 uyyrv):C va 
[Trt:A [Ttt:B 


1 Lr 
ery Car eae Tea@:Ate 


TEt:0 


T+ case“(t) : A ee) 


Figure 4.3: Typing rules for A-calculus with products and sums. 
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6-reduction rules: 


(Awv.t) u —>¢ tlu/e] 
m((t, u)) +e t 
T(t, u)) 
(t),c > u,yt> v) >, ult/2] 
) /y) 


(t), cH uy v) —>, v{t 


Commuting reduction rules: 


case“ (t) u —+g case? (t) 
m(case“*? (t)) —yg case“ (t) 
m,(case**? (t)) —>g case? (t) 
case(case“*F (t), 7 +> u,y > v) —>g case“ (t) 
aoe (t)) —+g case“ (t) 
case(t,z H+ u,y> v) w —>¢ case(t, tH uw,yrH vw) 
m(case(t, cH u, y+ v)) 46 case(t, 2 4 m(u),y > m(v)) 
Tr(case(t, ZH u,y > v)) —¢ case(t, 7H 7,(u),y H m,(v)) 
case“ ( )) 3 case(t, 2 + case“ (u), y > case°(v)) 


case(t, 2 u,yrov 
/ 


case(case(t, 2 u,yrv),a’ Huy bv’) 
7B 
case(t, 7 +> case(u, 2’ Hu’, y’ > 0’), y case(u, 2’ > u’, y+ 0’) 


Figure 4.4: Reduction rules for \-calculus with products and sums. 
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— strong normalization (theorem 4.2.2.5), 


— Curry-Howard correspondence (theorems 4.1.7.1 and 4.1.8.5). 


4.3.1 Products. In order to accommodate products, we add the construction 
AxB 


to the syntax of our types, which corresponds to the product of two types A 
and B. We also extend A-terms with three new constructions: 


thus=... | (t,u) | m(é) | a(t) 
where 
— (t,u) is the pair of two A-terms t and u, 
— m(t) takes the first component of the \-term t, 
— 7,(t) takes the second component of the A-term t. 


We add three new typing rules to our system, one for each of the newly added 
constructors: 


TKt:AxB Trt:AxB : THKt:A Tru:B 


1 


TE m(t):A (<n) Th a,(t):B (<b) TH (t,u): Ax B 


x1) 


The first one states that if t is a term, which is a pair consisting of an element 
of A and an element of B, then the term m(t), obtained by taking its first 
projection, has type A. The second rule is similar. The last rule establishes 
that if t is of type A and uw is of type B then the pair (t, u) is of type A x B. 

If we apply our term erasing procedure of section 4.1.7, and replace the 
symbols x by A, we recover the rules for conjunction: 


TFAAB TFAAB [TFA [TFB 


Ay At A 
cea fee Trans “Y 
This means that our extension of simply typed A-calculus is compatible with 
the Curry-Howard correspondence (theorem 4.1.7.1). 
Recall that the cut-elimination rules for conjunction consist in the following 
two cases: 


Tv Tv 
TrA TPB 
reaas 1 
TRA (“s) ~~ TEA 
Tv a 
Pea: PEs 
rréaaB a! 
TRB (Ae) ~~ TER 
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By the Curry-Howard correspondence, they correspond to the following trans- 
formations of typing derivations: 


/ 


1 T 
TrKt:A Tru:B 
Tr (t,u):AxB Be 1 
THE m((t,u)): A xz) ThKt:A 
1 a 
Trt:A Tru:B 
TH (t,u):AxB pies wv 
TF a,((t,u)): B (<b) Tku:A 


which indicate that the reduction rules associated to the new connectives should 
be 


m((t, u)) —>, t T((t,u)) —>g u 


as expected: taking the first component of a pair (t, u) returns ¢, and similarly 
for the second component. 

Finally, the 7-expansion rule corresponds to the following transformation of 
the proof derivation: 


7 7 
Ttt:AxB 1 [Ttt:AxB . 

T ThE m(t):A oe Th a,(t): B (<p) 
THt:AxB TE(m(@),7,()):AxB ea) 


It should thus consist in the rule 
t —y (m(t), m(t)) 


which states some form of “extensionality” of products: a term which is a prod- 
uct should be the same as the pair consisting of its components. 


Alternative formulations. Alternative formulations are possible for those con- 
nectives. Instead of taking the first and second projection of some term 1, 
i.e. m(t) and 7,(t), we could simply add to our calculus the first and second 


projection operators mm B and nAB , whose associated typing rules would be 


Tea? :AxBoA Thrt®:AxBoB 


This would correspond to an approach using combinators, also called Hilbert- 
style, see section 4.5. 
Another alternative, could be to add a constructor 


unpair(t, ry H u) 
which would corresponds to OCaml’s 


let (x,y) = t inu 
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It binds the two components of a pair t as x and y in u. The corresponding 
typing rule would be 


TFt:AxB T,a:A,y:Bru:C 
TF unpair(t, zy u):C 


This is the flavor of rules which has to be used when working with dependent 
types, see section 8.3.3. We did not use it here because, through the Curry- 
Howard correspondence, it corresponds to the following variant of elimination 
rule for conjunction 


TKAAB T,A,BEC 
TEC 


AE 


which is not the one which is traditionally used (the main reason is that it 
involves a “new” formula C, whereas the usual rules (A},) and (Ai) only use A 
and B). 


Currying. An important property of product types in A-calculus is that they are 
closely related to arrow types through an isomorphism called currying, which 
states that a function with two arguments is the same as a function taking a 
pair of arguments. More precisely, given types A, B and C, the two types 


AxBoC and Av>+Bw3C 


are isomorphic, see remark 4.1.7.3, where the first type is implicitly bracketed 
as (A x B) > C. In OCaml, this means that it is roughly the same to write a 
function of the form 


let fx y=... 
or a function of the form 
let f (x, y) =... 


More precisely, the isomorphism between the two types means that we have 
A-terms which allow converting elements of on type into an element of the other 
type, in both directions: 


Np Pe NEAR of (a0) ASB SO). A B 0) 
NfA7Bo Da”? fm (a) (x): (AG BOC) (Ax BOC) 
whose composites are both identities (up to 87-equivalence). Namely, writing t 
and u respectively for those terms, we have 
pe ai G f) —+2 APArePre Near Flmi(a); m,(x)) i by eam 
NPP OPIS hf) — ep Nae NO? fab, Nfs Ff 
In most programming languages (Java, C, etc), a function with two arguments 
would be given a type of the form A x B > C. Functional programming 
languages such as the OCaml tend to prefer types of the form A > B— C 


because they allows partial evaluation, meaning that we can give the argument 
of type A without giving the other one. 
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4.3.2 Unit. We can add a unit type by adding a constant type 
1 


called unit. It corresponds to the unit type of OCaml and, through the Curry- 
Howard correspondence to the formula T. We add a new constant A-term () 
which is the only element of 1 (up to 6-equivalence), and corresponds to () in 
OCaml. The typing rule is (a) 

1y 


TR():1 
which corresponds to the usual rule for truth by term erasure: 
(tT 
per” 
There are no /- or 7-reduction rules. 


4.3.3 Coproducts. For coproducts, we add the construction 
A+B 


on types, which represents the coproduct of the two types A and B. Intuitively, 
this corresponds to the set-theoretic disjoint union: an element of A+ B is either 
an element of type A or an element of type B. We add three new constructions 
to the syntax of A-terms: 


tuys... | case(t,2 5 uy v) | A(t) | A) 


where t, wu and v are terms, x and y are variables and A is a type. Since 
A+ B is the disjoint union of A and B, we should be able to see a term t of 
type A (resp. B) as a term of type 4+ B: this is precisely represented by the 
term P(t) (resp. v4 (t)), which can be thought of as the term t “cast” into an 
element of type A+ B. For this reason, re and 14 are often called the canonical 
injections. Conversely, any element of A+ B should either be an element of A 
or an element of B. This means that we should be able to construct new values 
by case analysis: for instance, given a term t of type A+ B, 


— if t is of type A then we return u(t), 
— if t is of type B then we return v(t). 


Above, u (resp. v) should be a A-term with a distinguished free variable x 
(resp. y) which is to be replaced by ¢. In formal notation, such a case analysis 
is written 

case(t, 2H u,y > v) 
The symbol “+4” is purely formal here (it indicates bound variables), and our 


operation takes 5 arguments t, z, u, y and v. With the above intuitions, it 
should be no surprise that the typing rules are 


TFt:A+B T,a:Aru:C pee ee ag 
TH case(t,2 OO u,yHrvu):C = 

Trt: A TRB ; 

(+1) (+1) 


PeiP@): A+B Tei): A+B 
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From a Curry-Howard perspective, + corresponds to disjunction V, and term 
erasure of the above typing rules do indeed allow us to recover the usual rules 
for disjunction: 


PEAVB “RAO. VBE 


TRO We) 
FEA PKB. 
Trave Trave W 


The 6-reduction rules correspond to the cut-elimination step reducing 
T 
THt:A a) ri a 
+ 
TeiP(t): A+B : T,2:Aru:C Ty: BeEu:C 


TE case(eP(t), co uy v):C fe) 
to 
m [n/a] 
TE uft/a]:C 


as well as the symmetric one, obtained by using u, instead of 1. The 6-reduction 
rules are thus 


case(tP(t),a > u,y > v) —> ult/a] 


case(tA(t), 2 u,y > v) —¥¢ o[t/y] 
The -expansion rule is 


t —>, case(t, 2 > uP (t), yr u(t) 


In OCaml. We recall from section 1.3.2 that coproducts can be implemented in 
OCaml as the type 


type ('a, 'b) coprod = 
| Left of ‘a 
| Right of 'b 


The injections 4, and u, respectively correspond to Left and Right, and the 
eliminator case(t, 7H u,y > v) to 


match t with 
| Left x ->u 
| Right y -> v 


Church vs Curry style. Note that if we remove the type annotations on the 
injections, i.e. write 1,(t) instead of 1P(t), then the typing of a A-term is not 
unique anymore. Namely, the typing rules become 


THt:A THKt:B 


Tray ave Feu@iAue > 


and, in the first rule, there is no way of guessing the type B in the conclusion 
from the premise (and similarly for the other rule). Similar issues happen if we 
remove the type annotations from abstractions, these are detailed in section 4.4. 
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a-conversion and substitution. The reason why we use the symbol “>” in terms 
case(t, 7+ u,y +> v) is that it indicates that x is bound in u (and similarly for 
v), in the sense that our a-equivalence should include 


case(t, 21> u,y> v) ==, case(t, 2’ ula’ /a],y' & viy’/y]) 


This also means that substitution should take care not to accidentally bind 
variables: the equation 


(case(t, 21> u,y > v)) [w/z] = case(tlw/z],c ulw/z},y > v[w/z]) 


is valid only when x ¢ FV(w) and y ¢ FV(w). 

Such details can be cumbersome in practice when performing implementa- 
tions, and we already have spent a great deal of time doing this correctly for 
abstractions. It is possible to use an alternative formulation of the elimina- 
tor which allows the use of abstractions, thus simplifying implementations by 
having abstractions being the only case where we have to be careful about cap- 
ture of variables: in the construction case(t,z +> u,y +> v), instead of having 
u (resp. v) be a A-term with a distinguished free variable x (resp. y), we can 
directly describe it as the function \x.u (resp. Ay.v). Our eliminator thus now 
has the form 

case(t, u,v) 


taking three terms in argument, with associated typing rule 


Trt:A+B Tru:A“~C Trv:BoC 


+ 
TF case(t, u,v) :C oa 
and the 6-reduction rules become 
case(if?(t), u,v) —>g ut case(iA(t), u,v) —>g vt 


4.3.4 Empty type. The empty type is usually written 0. We extend the syntax 
of A-terms 
tu=... | case4(t) 


by adding one eliminator case4(t) which allows us to construct an element of an 
arbitrary type A, provided that we have constructed an element of the empty 
type 0 (which we do not expect to be possible). The typing rule is thus 


TFt:0 


I case4(t) : A el 


Through the Curry-Howard correspondence, the type 0 corresponds to the fal- 
sity formula L, and we recover the usual elimination rule 


There is no (-reduction rule and the 7-reduction rule is 


case"(t) —, t 
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for t of type 0, which corresponds to the transformation 


7 
TFt:0 (On) T 
I case®(t) : 0 . TFt:0 


As usual, negation can be implemented =A = A => L, i.e. we could define 
the corresponding type ~A = A> 0. 


4.3.5 Commuting conversions. As explained in section 2.3.6, when consid- 

ering both conjunctions and disjunctions, usual cuts are not the only situations 

where we want to simplify proofs, we also want to be able to remove the com- 

mutative cuts. By the Curry-Howard correspondence, this means that when 

having A-terms with both products and coproducts, we want some additional 

reduction rules, called commuting conversions, which are all listed in figure 4.4. 
For instance, we have the following commutative cut 


/ Mu 


Tv T T 
TEKAVB T,AFCAD T,BECAD ‘ 
TECAD Be 
TEC (Ar) 
which reduces to 
qr’ qr” 
7 T,AFCAD Al T,BECAD Al 
TEAVB T,AEFC E) r,BEC (Ae) 
THO (Ve) 


By the Curry-Howard correspondence, this means that the typing derivation 


/ " 
TT TT TT 


Trt:A+B T,a1:AFu:CAD T,y:Bku:CAD 
Tk case(t,2 4 uyHvu): CAD 
T+ m(case(t,r 4 uy v)):C 


(+8) 


(xz) 


should reduce to 


/ 
TT TT 


T T,x:Arku:CxD T,y:Brku:CxD 
TFt:A+B T,c: AF m(u):C Ty: BE mv): C 
TF case(t,2 + m(u),yH m(v)):C 


(x) 


and thus that we should add the reduction rule 


m™(case(t, 7+ u,y > v)) —+¢ case(t, 7H ™m(u), y > m(v)) 


which states that projections can “go through” case operators. Other rules are 
obtained similarly. 
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4.3.6 Natural numbers. In order to grow A-calculus into a more full-fledged 
programming language, it is also possible to add basic types (integers, strings, 
etc.) as well as constants and functions to operate on those. In order to illus- 
trate this, we explain here how to extend simply typed A-calculus with natural 
numbers. The resulting system, called system T, was originally studied by 
Gédel [G6d58}. 

The types are generated by 


A:=X|A—>B| Nat 


where the newly added type Nat stands for natural numbers. Terms are gener- 
ated by 
thuvz=axal|tul| art | Z | S(t) | rec(t,u,ry H v) 


where the term Z stands for the zero constant, and S(t) for the successor of a 
term t (supposed to be a natural number). The construction rec(t, u, ry +> v), 
known as recursor, allows definition of functions by induction: 


— if tis 0, it returns u, 
— if tis n+ 1, it returns v where x has been replaced by n and y by the 


value recursively computed for n. 


In OCaml. In OCaml, using int as representation for natural numbers, Z would 
correspond to @, S(¢) to t+1 and the recursor to 


let rec recursor t u v = 
if t = @ then u else v (t-1) (recursor t u v) 


Alternatively, we can represent natural numbers as the type 
type nat = Z | S of nat 
where Z corresponds to Z, S to S and the recursor to 


let rec recursor t uv = 
match t with 
| Z -> u 
| Sn -> vn (recursor nu v) 


Traditionally, addition can be defined by induction by 


let rec add mn = 
match m with 
| Z ->n 
| Sm -> S (add m n) 


However, it can be observed that all the induction power we usually need is 
already contained in the recursor, so that this can equivalently be defined as 


let add mn = recursor mn (fun mr -> Sr) 
Similarly, multiplication can be defined as 


let mul mn = recursor m Z (fun mr -> add r n) 
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and other traditional functions (exponentiation, Ackermann’s function, etc.) are 
left to the reader. 

There are, however, some functions that can be written using recursion in 
OCaml, but cannot be encoded using recursor. For instance, all functions writ- 
ten with the recursor are total and therefore the function 


let rec omega n = omega n 


cannot be implemented using it, since it never produces a result whereas the 
recursor always defines well-defined functions. 


Rules. The typing rules for the new terms are the following ones: 


Tet: Nat 


ce PrS@: Nat ©? 


TE Z: Nat 


Tt: Nat TFu:A T,a:Nat,y:Ahuv:A 
TF rec(t,u,zyt>v):A 


(Natz) 


The reduction rules ensure that the recursor implements the primitive recursion 
rules: 


rec(Z, u, ry +> v) gu 
rec(S(t), u, ry 4 v) —>, v|t/ax, rec(t, u, ry 4 v)/y] 


Properties. It can be shown, see section 4.3.7, that this system is terminating 
and confluent. Moreover, the functions of type 


Nat — Nat 


which can be implemented in this system are precisely the recursive functions 
which are provably total (in Peano Arithmetic, see section 5.2.5), i.e. recursive 
functions for which there is a proof that they terminate on every input. This 
class of functions strictly includes the primitive recursive ones, and it is strictly 
included in the class of total recursive functions. 


4.3.7 Strong normalization. The strong normalization proof presented in 
section 4.2.2 for simply typed A-calculus extends to the other connectives pre- 
sented above. For instance, following [Gir89, chapter 7], let us briefly explain 
how to adapt the proof for a \-calculus with products, unit and natural numbers. 
Types are thus generated by 


A,Br=X|A>B|AxB|1| Nat 


and terms by 


tuys a | Ac4t|tu| (t,u) | mt) | a) | 0 | Z | SQ | rec(t, u, zy 6 v) 
We extend the notion of reducibility candidate by 


Rx = R, = Rnar = {t | t is strongly normalizable} 
Ra-+p = {t | we Re implies tu € Ra} 
RaxB= {t | m(t) € Ra and T(E) € Rep} 


CHAPTER 4. SIMPLY TYPED \-CALCULUS 191 


We also extend the notion of neutral term: a term is neutral when it is not of 
the following forms 


xe (t, u) () vi S(t) 


which correspond to the possible introduction rules in our system. With those 
definitions, the proofs can be performed following the same structure as in 
section 4.2.2. 


4.4 Curry style typing 
In this section, we investigate simply typed A-calculus a la Curry when ab- 
stractions are of the form Az.t instead of Axr4.t, ie. we do not indicate the 


type of abstracted variables. A detailed presentation of this topic can be found 
in [Pie02, chapter 22]. 


4.4.1 A typing system. Curry-style typing is closer to languages such as 


OCaml, where we do not have to indicate the type of the arguments of a func- 
tion. For simplicity, we consider functions only, i.e. types are defined by 


A,B:=X|A>B 
and terms are defined by 
thun=a« | Art| tu 


similarly to the beginning of this chapter. The typing rules are 


Tea:T(2) oe 


T,x:Att:B 
TFAwt: A> B 


(1) 


Trt:A-B Thku:A 
TFtu:B 


(8) 


This seemingly minor change of not writing types for abstractions has major 
consequences on the properties of typing. In particular, theorem 4.1.6.1 does 
not hold anymore: a given A-term might admit multiple types. For instance, 
the identity \-term admits the following types: 


a 
gai:X ba: X 
Ete es ore a : 


(ax) 
a:YoAZra:YouZ 
F Awa: (Y 4 Z) > (Y > Z) 


1) (+1) 


and in fact, every type of the form A — A for some type A is an admissible 
type for the identity. 

The reason for this is that when we derive a type containing a type variable 
in this system, we can always replace this variable by any other type. Formally, 
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given types A and B and a type variable X, we write A[B/X] for the type 
obtained from A by replacing every occurrence of X by B in A. Similarly, given 
a context [, we write '[B/X] for the context where X has been replaced by B 
in every type. We have 

Lemma 4.4.1.1. If [ - t : A is derivable then T[B/X] + t : A[B/X] is also 
derivable for every type B and variable X. 


Proof. By induction on the derivation of TF ¢: A. 


For instance, since the identity admits the type X > X, it also admits the same 
type where X has been replaced by Y > Z, i.e. (Y — Z) > (Y > Z). The 
first type is “more general” than the second, in the sense that the second can 
be obtained by substituting type variables in the first. We will see that any 
term admits a type which is “most general”, in the sense that it is more general 
than any other of its types. For instance, the most general type for identity 
is X > X. Again, this phenomenon is not present in Church style typing, 
e.g. the two terms 


da as X 4X Na’ 74 os (VY 4 Z) > (¥ 9 Z) 


are distinct: Curry is more spicy than Church. 


4.4.2 Principal types. Recall from section 2.2.11 that a substitution o is a 
function which associates a type to each type variable in Y. Its domain dom(c) 
is the set of type variables 


dom(o) ={X € # | o(X) 4X} 


This set will always be finite for the substitutions we consider here, so that, in 
practice, a substitution can be described by the images of the variables X in its 
domain. Given a type A, we write A[c] for the type A where every variable X 
has been replaced by o(X). Formally, it is defined by induction on the type A 
by 


X[o] = o(X) 
(A > B)[o] = Alo] > Blo] 


We say that a type A is more general than a type B, what we write AC B, 
when there is a substitution o such that B = Alo]. 


Lemma 4.4.2.1. The relation E is a partial order on types modulo a-conversion. 


Given a context T = 7; : Aj,...,2%n : An, we also write 
[lo] = 21: Aj[ol,.--,¢n: An[o] 


In this case, we sometimes say that the context T[o] is a refinement of the 
context I. It is easily shown that if a term admits a type, it also admits a less 
general type: lemma 4.4.1.1 generalizes as follows. 


Lemma 4.4.2.2. Given a term t such that [| t: A is derivable and a substitu- 
tion o then T'[o] F t : Alo] is also derivable. 


Proof. By induction on the derivation of PF t: A. 
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Definition 4.4.2.3 (Principal type). Given a context [ and a A-term t, a principal 
type (or most general type) for t in the context I’ consists of a substitution o 
and a type A such that 


— Tjo] + t: A is derivable, 


— for every substitution 7 such that [|r] + t : B is derivable, there exists a 
substitution 7’ such that 7 = 7’ oo and B= Afr’). 


In other words, the most general type is a type A for t in some refinement of 
the context [ such that every other type can be obtained by substitution, in 
the sense of lemma 4.4.2.2. 

This is often used in the case where the context I is empty, in which case the 
substitution o is not relevant. In this case, the principal type for t is a type A 
such that - t : A is derivable and which is minimal: given a type B, we have 
Ft: B derivable if and only if AC B. 


Example 4.4.2.4. The principal type for t = Axv.x is X — X: the types of t are 
those of the form A — A for some type A. 


Example 4.4.2.5. The principal types for the \-terms 
Aryz.(xz) (yz) and ALY.L 


are respectively 


(X 9 Y > Z)A(X AY) 9 XH 7 and X->Y OX 


4.4.3 Computing the principal type. We now give an algorithm to compute 
the principal type of a A-term. A type equation system is a finite set 


E={A, 2By,...,An 2 Bn} (4.2) 


consisting of pairs types A; and B;, written A; + B; and called type constraints. 
A substitution o is a solution of the equation system (4.2) when applying it 
makes every equation of F valid, i.e. for every 1 <7 < n we have 


A;{o] = B,{o] 


Typing with constraints. The idea is that to every context [ and A-term t, we 
are going to associate a type A and a type equation system FE which are complete 
in the sense that 


— for every solution o of EF, we have 
Tfo] + t: Alo] 
— if there is a substitution o such that 
Tilo] kt: B 


then o is a solution of E such that B = A[o]. 
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In this sense, the solutions of FE describe all the possible types of t in the 
refinements of the context IT. Its elements are sometimes called constraints 
since they encode constraints on acceptable substitutions. We will do so by 
imposing the “minimal amount of equations” to F so that t admits a type A in 
the context [. As usual, this is performed by induction on ¢, distinguishing the 
three possible cases: 


— a: we have a type A if and only if « € dom(T), in which case A = ['(z), 


— \x.t: the type A should be of the form B — C for where C is the type 
of t. Writing A, for the type inferred for t, we thus define A = X —> A; 
for some fresh variable X, 


— tu: we have a type A if and only if t is of the form B > A and u is 
of type B. Writing A; for the type inferred for t and A, for the type 
inferred for u, we thus define A = X for some fresh variable X and add 
the equation 


A, #(Ay 7 X) 


Above, the fact that X is “fresh” means that it does not occur somewhere else 
(in the contexts, the types or the equation systems). 


Sequent presentation. More formally, this can be presented in the form of a 
“sequent” calculus, where the sequents are of the form 


Tht: A|E 


where I is a context, t is a term, A is a type and EF is a type equation system: 
given T and t, a the derivation of such a sequent will be seen as producing the 
type A and the equations E. The rules are 


Ta: Xt: A,|E; ; 
(1) with X fresh 
Th Awt: X + A;y| EF; 
Tht: Ay | E; Thu: Ay| Eu 


ith X fresh 
Pete XE UE, UA, = wi res 


Example 4.4.3.1. For instance, for the term Afa.fx, we have the following 
derivation: 


i ee Cone a ey ee Saree aie: 
f:Zca:Xt+fa:Y|\Z=X AY (+z) 
f:ZbAufa:X 3Y|Z=X HY — 

71 


b Af Au fa:Zoa(x oY)|Z=X oY 


The type A and the equations F describe exactly all the possible types for t 
in the context [ in the following sense. 
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Lemma 4.4.3.2. Suppose that [ + t : A| E is derivable using the above rules, 
then 


— for every solution o of E the sequent [jo] + t : Alo] is derivable (in the 
sense of section 4.1.4), 


— if there is a substitution o and a type B such that [jo] + t: B is derivable 
then o is a solution of E and B = Alo]. 


Proof. By induction on the derivation of fF t: A] E. 


It is easily seen that given a context [ and term ¢ there is exactly one type A 
and system F such that [+ t : A|£ is derivable (up to the choice of type 
variables), we can thus speak of the type A and the system E associated to a 
term ¢ in a context [. Moreover, the above rules are easily translated into a 
method for computing those. An implementation of the resulting algorithm is 
provided in figure 4.5: the function infer generates, given a an environment 
env describing the context I, the type A and the equation system E, encoded 
as a list of pairs of terms. 


Computing the principal type. What is not clear yet is 
— how to compute the solutions of EF, 
— how to compute the most general type for ¢ in the context I. 


We will see in section 5.4 that if a system of equations admits a solution then 
it admits a most general one: provided there is a solution, there is a solution o 
such that the solutions are exactly substitutions of the form 7 0 ao for some 
substitution 7. Moreover, we will see an algorithm to actually compute this 
most general solution: this is called the unification algorithm. This finally 
provides us with what we were looking for. 


Theorem 4.4.3.3. Suppose given a context I and a term t. Consider the type A 
and the system E such that [+ t: A| E is derivable, and write o for the most 
general solution of E. Then the substitution o together with the type Alo] is a 
principal type for t in the environment I. 


In-place unification. In practice, people do not implement the computation of 
most general types by first generating equations and then solving them, although 
there are notable exceptions [PR05]: we can directly change the value of the 
variables instead of deferring this using equations. Moreover, this can be done 
efficiently by using references. We will see in section 5.4.5 that unification 
can always be performed this way, and only describe here the implementation 
specialized to our problem. 
Instead of generating equations, we can replace type variables as follows: 


— when we have an equation of the form X + A, we can directly replace X 
by A, provided that X ¢ FV(A), 


— when we have an equation of the form A + X, we can directly replace X 
by A, provided that X ¢ FV(A), 
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(xx Types *) 
type ty = 
| TVar of int 
| TArr of ty * ty 


(*x Generate a fresh type variable. *) 
let fresh = 
let n = ref (-1) in fun () -> incr n; TVar !n 


(xx Terms. *) 

type term = 
| Var of string 
| Abs of string * term 
| App of term * term 


(«x Type constraints. *) 
type teq = (ty * ty) list 


(** Type and equations. *) 
let rec infer env : term -> ty * teq = function 
| Var x -> List.assoc x env, [] 
| Abs (x, t) -> 
let ax = fresh () in 
let at, et = infer ((x,ax)::env) t in 
TArr (ax, at), et 
| App (t, u) -> 
let at, et = infer env t in 
let au, eu = infer env u in 
let ax = fresh () in 
ax, (at, TArr (au, ax))::(et@eu) 


Figure 4.5: Typability with constraints in OCaml. 
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— when we have an equation of the form (A > B) #(A’ > B’), we can 
replace it by the two equations A + A’ and B + B’, and recursively act 
on those. 


In order to perform this efficiently, we change the representation of type variables 
to the following: 


(** Types *) 

type ty = 
| TVar of tvar ref 
| TArr of ty * ty 


(** Type variables. *) 

and tvar = 
| Link of ty (* a substituted type variable *«) 
| AVar of int (* a type variable *«) 


A variable, corresponding to the constructor TVar, is now a reference, meaning 
that its value can be changed. Initially, this reference will point to a value of 
the form AVar n, meaning that it is the variable with number n. However, we 
can replace its contents by another type A, in which case we make the reference 
point to a value of the form Link A (it is a “link” to the type A): this method 
has the advantage of changing at once the contents of all the occurrences of 
the variable. The type tvar thus indicates the possible values for a variable: 
it is either a real variable (AVar) or a substituted variable (Link). With this 
representation, a variable containing a link to a type A should be handled as 
if it was the type A. To this end, we implement a function which will remove 
links at the top level of types: 


let rec unlink = function 
| TArr (a, b) -> TArr (a, b) 
| TVar v as a -> 
match !v with 
| Link a -> unlink a 
| AVar __-> a 


In order to check the side condition X ¢ FV(A) above, we need to implement 
a function which checks whether a variable X occurs in a type A, i.e. whether 
X € FV(A). This easily done by induction on A: 


let rec occurs x = function 
| TArr (a, b) -> occurs x a || occurs x b 
| TVar v -> 
match !v with 
| Link a -> occurs x a 
| AVar _ aS y -> x =y 


Next, instead of generating an equation A 2 B, we will use the following function 
which will replace the type variables in A and B following the method described 
above, called unification: 


let rec unify a b = 
match unlink a, b with 
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| TVar v, b -> assert (not (occurs !v b)); v := Link b 
| a, TVar v -> v := Link a 
| TArr (a, b), TArr (a', b') -> unify a a'; unify b b' 


Finally, the type inference algorithm can be implemented as before, except that 
we do not return the equations anymore, only the type, since type variables are 
changed in place: in the case of application, instead of generating the equation 
A; #(Au — X), we instead call the function unify which will replace type 
variables in a minimal way needed to make the types A; and A, — X equal: 


let rec infer env = function 

| Var x -> 
List.assoc x env 

| App (t, u) -> 
let a = infer env u in 
let b = fresh () in 
unify (infer env t) (TArr (a,b)); 
b 

| Abs (x, t) -> 
let a = fresh () in 
let b = infer ((x%,a)::env) t in 
TArr (a, b) 


Example 4.4.3.4. The term Afa.f a can be represented as the term 
Abs ("f", Abs ("x", App (Var "fl", Var "x"))) 


If we infer its type (in the empty environment) using the above function infer, 
we obtain the following result 


Arr 
(TVar 
{contents = 
Link 
(TArr (TVar {contents = AVar 1}, TVar {contents = AVar 2}) 


}, 
TArr (TVar {contents = AVar 1}, TVar {contents = AVar 2})) 


which is OCaml’s way of saying 
(X > Y)79 (xX > Y) 


(in OCaml, references are implemented as records with one mutable field la- 
beled contents). 


Remark 4.4.3.5. In the unification function, when facing an equation X 3 A, 
it is important to check that X does not occur in A. For instance, let us try 
to type Axv.xx, which is not expected to be typable. The inference will roughly 
proceed as follows. 


1. Since it is an abstraction, the type of Ax.xx must be of the form X > A, 
where A is the type of xx. Let’s find the type of xx assuming x of type X. 


2. The term xz is an application whose function is x of type X and argument 
is x of type X. We must therefore have X #+(X — Y) and the type of rx 
is Y. 
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With the above implementation, the algorithm will raise an error: the unification 
of X and X + Y will fail because X € FV(X > Y). If we forgot to check this, 
we would generate for x the type X — Y where X is (physically) the type itself. 
This would intuitively correspond to allowing the infinite type 


(CASVSVoNHSY 


which should not be allowed. 


Typability. The above algorithms can also be used to decide the typability of a 
term ft, i.e. answer the following question: is there a context in which t admits 
a type? 


Theorem 4.4.3.6. The typability problem for A-calculus is decidable. 


Proof. Suppose given a term t. We write FV(t) = {x1,...,%n} for the set of free 
variables and define the context [ = 27, : Xj,...,%p : Xn. Using lemma 4.1.5.1, 
it is not difficult to show that ¢ admits a type if and only if it admits a type in 
the context I’, which can be decided as above. 


4.4.4 Hindley-Milner type inference. In this section, we go on a small ex- 
cursion and investigate polymorphic types. We have seen that a Curry-style 
A-term usually admits multiple types. However, a given term cannot be used 
within a same term with two different types. In a real-world programming lan- 
guage this is a problem: for instance, if we define the identity function, we 
cannot apply it both to integers and strings. If we want to do so, we need 
to define two identity functions, one for integers and one for strings, with the 
same definition. One way to overcome this problem is to allow functions to be 
polymorphic, i.e. to have multiple types. For instance, we will be able to type 
the identity as 
VX.X > X 


meaning that it has type X — X for any possible value of the variable X. 
OCaml features such types: variables beginning by a ’ are implicitly universally 
quantified, so that the identity has type ’a -> ’a. 


Type schemes. Formally, a type A is defined as before, and type schemes A are 
generated by the grammar 
A:=A|VX.A 


where X is a type variable and A is a type. In other words, a type scheme is 
a type with some universally quantified variables at top level, ie. a formula of 
the form 

VX1....VXn.A 


Having such a “type” for a term means that it can have any type in the set 
[A] i {A[A1/X1, ea Ani Xa | Ai, ones pAn types} 


i.e. any type obtained by replacing the universally quantified type variables by 
some types. As usual, in a type scheme VX.A, the variable X is bound in A and 
could be renamed. The free variables of a type scheme are 


FV(WX1....VXn.A) = FV(A) \ {X1,-..,Xn} 
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Given a variable X and a type B, we write A[B/X] for the type scheme A where 
the variable X has been replaced by B (as usual, one has to properly take care 
of bound variables): 


(VX1....VX,-A)[B/X] =VX1....VX,.A[B/X] 


whenever X; ¢ FV(B) whenever 1 < i < n. We consider type schemes modulo 
a-conversion, which can be defined by: 


VX.A = VY.A[Y/X] 


We write AC B when the set of types described by A is included in the set 
of types of B, i.e. [A] C [B]. In this case, we say that the type scheme A is 
more general than B, and that B is less general or a specialization of A. 


Lemma 4.4.4.1. We have 


WX1....WX,.A LVY....V¥_.B 


if and only if there are types A;,..., A, such that B = A[A,/X,...,An/Xn] 
and the Y; are variables which are not free in VX1....VXy.A. 


When we have A EC B, which means that B was obtained from A by replacing 
some universally quantified variables X; by types A;, but not only: we can also 
universally quantify some of the fresh variables introduced by the A; afterwards. 
For instance, we have 


VX.X OX CVY(Y OY) (Y BY) EC (45 2) 9 (25 2) 


Hindley-Milner typing system. We are now going to give a typing system for a 
programming language whose terms are 


thus=a|Art|tu|leta=tinu 


Compared to A-calculus, the only new construction is let x = tinu, and means 
that we should declare x to be t in the term u. From an operational point 
of view, it is thus the same as (Az.u)t. The two constructions however differ 
from the typing point of view: the type of a variable defined by a let will be 
generalized, which means that we are going to universally quantify the type 
variables we can, so that the type becomes polymorphic. A variable declared 
by a let can thus be used with multiple types, which is not the case for an 
argument of a function. For instance, in OCaml, we can define the identity once 
with a let, and use it on an integer and on a string: 


let O = 
let id = fun x -> x in 
print_int (id 3); print_string (id "a") 


This is allowed because the type inferred for id is VX.X — X, which is poly- 
morphic. On the other hand, the following code is rejected: 


let © = 
(fun id -> 
print_int (id 3); print_string (id "a") 
) (fun x -> x) 
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Namely, the type inferred for the argument id of the function is X + X. During 
the type inference, OCaml sees that it is applied to 3, and therefore replaces 
X by int, ie. it guesses that the type of id must be int — int and thus 
raises a type error when we also apply it to a string. The identity argument is 

monomorphic: it can be applied to an integer, or to a string, but not both. 
We now present an algorithm, due to Hindley and Milner [Hin69, Mil78] 

which infers such types. A context I is a list 
%1:Ajy,.--,an:A 


“n 


consisting of pairs of variables and type schemes. The free variables of such a 
context are 


FV(I) = U FV(A;) 


We will consider sequents of the form [+ t: A where I is a context, t a term 
and A a type: we still infer a type (as opposed to a type scheme) for a term. 
The rules for our typing system, which assigns type schemes to terms are the 
following ones: 


I'(a2)=A ACB [Tht:A T,a:VpAFru:B 
(ax) (let) 
TrFa:B TFletx=tinu:B 
TFt:A>+B TFu:A T,2:Att:B 
TFrtu:B 7B) feerAase 


The rules (+) and (7) for elimination and introduction of functions are the 
usual ones. The rule (ax) allows to specialize the type of a variable in the 
context: if x has type A in the context T', then we can assume that it actually 
has any type B with A EC B: with our above example, if id has the type scheme 
VX.X — X in the context, then we can assume that it has type int > int 
(or string — string) when we use it, and we can make different assumptions 
at each use. Finally, the rule let states that if we can show that ¢ has type A, 
then we can assume that it has the more general type scheme 


VrA =VX....VXn.A 


where FV(A) \ FV(T) = {X1,...,Xn}, called the generalization of A with 
respect to . We thus universally quantify over all type variables which are not 
already present in the context. 

Remark 4.4.4.2. In the rule (let), it is important that Vp A does not universally 
quantify over all the variables in A, but only those in FV(A) \ FV(T). Suppose 
that we did not have this restriction and quantify over all the variables of A. 
We would then have the derivation 


—_—_—_——_ (ax) (ax) 
race, Gal al ania, a: X,y:VX.X Fy: Y (let) 
e 


a:XFlety=ainy:Y (oy) 
> 
F Avw.lety=aviny: X ~Y ; 


This is clearly incorrect since the term Axv.let y = x iny is essentially the iden- 
tity, and thus should have X — X as principal type. 
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The following proposition shows that this typing system amounts to the 
simple one of section 4.4.1, where it would allow us to infer the type of an 
expression each time we use it: 


Proposition 4.4.4.3. The sequent [ + leta =tinu: A is derivable if and only 
if t is typable in the context [T and [| u[t/a]: A is derivable. 


Algorithm W. Formulated as above, it is not clear how to write down an algo- 
rithm which would infer the most general type of a term. The problem lies in 
the (ax) rule: given a variable x which has type scheme A in the context, we 
have to come up with a type B which specializes A, and there is no obvious 
way of doing so. Instead, we will replace all the universally quantified variables 
of A by fresh variables, and will gradually compute a substitution which will fill 
those in. The resulting algorithm is called algorithm W [DM82]. It is very close 
to the algorithm we have seen in section 4.4.3 except that, instead of generating 
type equations and solving them afterward in order to obtain a substitution, we 
compute the substitution during the inference. We can express this algorithm 
using sequents of the form 
TFt:Alo 


where [ is a context, ta term, A a type and o a substitution. The rules are the 
following ones, they should be read as producing A and o from [ and t: 
I'(a#)=A 


Tha: !A|id =) 


Tja:XFt: Blo X fresh 
Tt Ax.t: X[o] > Blo 


?I 


TFt:Clo Tru:Alo’ X fresh o” = mgu(A > X,C) 
Trtu: X[o"]|c" oa’ oa 


(+8) 


TKt:Alo To], :VrpojAru: Blo’ 
[Tt letz=tinu: Blo’oa 


(let) 


Those can be explained as follows. 


(ax) Given the type scheme A associated to the variable x in the context, we 
declare that the type for x is !A under the identity substitution. Here, !A 
is a notation for the instantiation of A, by which we mean that we have 
replaced all the universally quantified variables by fresh ones (i.e. variables 
which do not already occur in T). If the type scheme A is VXq....VX7.A, 
the type !A is thus 

A[Y1/X1,..-,¥n/Xn] 


where the variables Y; are fresh and distinct. 


(1) In order to infer the type of Ax.t, we have to guess a type for x and infer 
the type of ¢ in the context where x has this type. Since we have no idea of 
what this type should be, we simply infer the type of t in the environment 
where x has type X, a fresh type variable. This will result in a type B 
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and a substitution o such that T'[o],2: X[o] t: B and therefore we can 
deduce that \x.t has type X[o] > B. 


(+) We first infer a type A for u, and the type C for ¢t. In order for tu 
to be typable C should be of the form A — B. We therefore use the 
unification procedure described in section 5.4 in order to compute the 
most general substitution o” such that o”(A > X) = o(B) for some 
fresh variable X, and we will have B = X|o]; this substitution is writ- 
ten o” = mgu(A > X,C) (here, “mgu” means most general unifier, see 
section 5.4.2). We deduce that tu has the type B we have computed. 


(let) There is no real novelty in this rule compared to earlier. In order to infer 
the type of leta =tinu, we infer a type A for t and then infer a type 
B for u in the environment where x has the type scheme obtained by 
generalizing A with respect to I. 


This algorithm generates a valid type according to the previous rules: 
Theorem 4.4.4.4 (Correctness). If [  t : Alo is derivable then T'[o] F t: A is 
derivable. 

Moreover, it is actually the most general one that could be inferred: 

Theorem 4.4.4.5 (Principal types). Suppose that [T + t : Alo is derivable. 


Then, for every substitution 7 and type B such that ['[7] + t: B there exists a 
substitution 7’ such that 7 = 7’ 00 and B= A[r']. 


Example 4.4.4.6. Here are some principal types which can be computed with 
the algorithm: 


Av.lety=axiny:X ~X 
Av.lety=Az.ainy:X ~Y ~X 
Av.lety = Az.aziny:(X >Y) > (xX > Y) 


Implementing algorithm W. Algorithm W can be coded by suitably implement- 
ing the above rules. We define the type of terms as 


type term = 
| Var of var 
| App of term * term 
| Abs of var * term 
| Let of var * term * term 


where var is an alias for int for clarity. Then type schemes are encoded as the 
following type: 


type ty = 
| EVar of int (* non-quantified variable x) 
| UVar of int (* universally quantified variable «) 


| TArr of ty * ty 


Here, instead of universally quantifying some variables, we use two constructors: 
UVar n is a variable which is universally quantified, and EVar n is a variable 
which is not. The generation of fresh type variables can be achieved with a 
counter, as usual: 
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let fresh = 
let n = ref (-1) in fun () -> incr n; EVar !n 


Next, the instantiation of a type scheme is performed by replacing each univer- 
sally quantified variable with a fresh, non-quantified, one (we use a list tenv in 
order to remember when a universal variable has already been replaced by some 
variable, in order to always replace it by the same variable): 


let inst = 
let tenv = ref [] in 
let rec inst = function 


| UVar x -> 
if not (List.mem_assoc x !tenv) then 
tenv := (x, fresh ()) :: !tenv; 


List.assoc x !tenv 
| EVar x -> EVar x 
| TArr (a, b) -> TArr (inst a, inst b) 
in 
inst 
The following function checks whether a variable occurs in a type: 


let rec occurs x = function 


| EVar y —->xX=y 
| UVar _ -> false 
| TArr (a, b) -> occurs x a || occurs x b 


We can then generalize a type with respect to a given context by changing each 
variable EVar n which does not occur in the context into the corresponding 
universally quantified variable UVar n: 


let rec gen env = function 
| EVar x -> 
if List.exists (fun (_,a) -> occurs x a) env 
then EVar x else UVar x 
| UVar x -> UVar x 
| TArr (a, b) -> TArr (gen env a, gen env b) 


We can finally implement the function which will infer the type of a term in a 
given environment and return it together with the corresponding substitution. 
The four cases of the match correspond to the four different rules above: 


let rec infer env = function 
| Var x -> 
let a= 
try List.assoc x env 
with Not_found -> raise Type_error 
in 
inst a, Subst.id 
| Abs (x, t) -> 
let a = fresh () in 
let b, s = infer ((x,a)::env) t in 
TArr (Subst.app s a, b), s 
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| App (t, u) -> 
let a, su = infer env u in 
let b = fresh () in 
let c, st = infer env t in 
let s = unify (TArr (a, b)) c in 
Subst.app s b, Subst.comp s (Subst.comp su st) 
| Let (x, t, u) -> 
let a, st = infer env t in 
let b, su = infer ((x,gen (Subst.app_env st env) a)::env) u in 
b, Subst.comp su st 


We have implemented substitutions as functions int -> ty associating a type to 

a type variable. The functions Subst .id, Subst.comp, Subst. app and Subst. app_env 
respectively compute the identity substitution, the composite of substitutions 
and the application of a substitution to a term and to an environment. Their 
implementation is left to the reader. Finally, above, the function unify imple- 
ments the unification algorithm described in section 5.4: 


let rec unify l = 
match 1 with 
| (EVar x, b)::1 -> 
if occurs x b then raise Type_error; 
Subst.comp (unify 1) (Subst.make [x, b]) 
| (a, EVar x)::1 -> 
unify ((EVar x, a)::1) 
| (TArr (a, b), TArr (a', b'))::1 -> 
unify (La, a'; b, b']@1) 
| (UVar _, _)::_ | (_, UVar _)::_ -> assert false 
| [J] -> Subst.id 
let unify a b = unify [a, b] 


Algorithm J. The previous algorithm is theoretically nice. In particular, it is 
well adapted to making correctness proofs. However, it is quite inefficient: we 
have to apply substitutions to many types (including to the context) and we 
have to go through the context to look for type variables which have been used. 
As in section 4.4.3, the solution is to modify type variables in-place by using 
references. The resulting algorithm is sometimes called algorithm J. We thus 
change the implementation of types to 


type ty = 
| EVar of tvar ref (* non-quantified variable *) 
| UVar of int (* universally quantified variable «) 
| TArr of ty * ty 

and tvar = 


| Unbd of int (* unbound variable *) 
| Link of ty (* substituted variable *) 


Most functions are adapted straightforwardly. The main novelty is in the uni- 
fication function, which now performs the modification of types in-place: 


let rec unify a b = 
match unlink a, unlink b with 
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| EVar x, _ -> 
if occurs x b then raise Type_error else x := Link b 
| _, EVar y -> unify b a 
| TArr (al, a2), TArr (b1, b2) -> unify al b1; unify a2 b2 
| _ -> raise Type_error 


and the type inference function which is simpler to write, because it does not 
need to propagate the substitutions: 


let rec infer env = function 
| Var x -> (try inst (List.assoc x env) 
with Not_found -> raise Type_error) 
| Abs (x, t) -> 
let a = fresh () in 
let b = infer ((x,a)::env) t in 
TArr (a, b) 
| App (t, u) -> 
let a = infer env u in 
let b = fresh () in 
let c = infer env t in 
unify (TArr (a,b)) c; 
b 
| Let (x, t, u) -> 
let a = infer env t in 
infer ((x, gen env a)::env) u 


The substitutions are now performed very efficiently because we do not have 
to go through terms anymore: references are doing the job for us. There is, 
however, one last source of inefficiency in this code: in the function unify, 
the function occurs x b has to go through all the type b to see whether the 
variable x occurs in it or not. There is a very elegant solution to this due to 
Rémy [Rém92] that we learned from [Kis13]. To each type variable, we are going 
to assign an integer called its level, which indicates the depth of let-declaration 
when it was created. Initially, the level is 0 by convention and in an expression 
letx = tinwat level n, the variables created by t will have level n+ 1, whereas 
the variables of u will still have level n (it is some sort of de Bruijn index). For 
instance, in the term 


let a = (letb = Ax.x in Ay.y) in rAz.z 


the type variables associated to x, y and z will have level 2, 1 and 0 respectively. 
This can be figured graphically as follows: 


level 2: ADL 
level 1: (let b= in Ay.y) 


level 0: leta= inAz.z 


One can convince himself that, in a term letx =tinu, the variables which 
should be generalized in the rule (let) are those which were “locally created” 
during the inference of t, i.e. those which are at a strictly higher level than 
the current one. We thus modify our implementation once more. We begin by 
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declaring a global reference, which will record the current level when performing 
the type inference, along with two functions in order to increase and decrease 
the current level: 


let level = ref 0 
let enter_level () = incr level 
let leave_level () = decr level 


We also change the representation of type variables: the constructor for unbound 
variables becomes 


| Unbd of int * int (* unbound variable (name / level) *) 


It now takes two integers as argument: the number of the variable (acting as its 
name) and its level. In the generalization function, we only generalize variables 
which are above the current level: 


let rec gen a = 
match a with 
| EVar x -> 
if tlevel x <= !level then EVar x 
else UVar (tname x) 
| UVar x -> UVar x 
| TArr (a, b) -> TArr (gen a, gen b) 


where tname and tlevel respectively return the name and type of a type vari- 
able. Finally, levels get updated in the infer function whose only change is in 
the Let case: 


| Let (x, t, u) -> 
enter_level (); 
let a = infer env t in 
leave_level (); 
infer ((x, gen a)::env) u 


We increase the current level when typechecking the definition and decrease it 
afterward. 


Example 4.4.4.7. In the function 
Axv.lety = Az.ziny 


the type variable Z associated to z has level 1, so that it gets generalized in the 
type of y, because y is declared at level 0 and 0 < 1: in the environment, y will 
have the type scheme VZ.Z — Z. However, in the function 


Av.lety = Az.xiny 


the type variable X associated to x does not get generalized because it is of 
level 0, so that y has the type scheme VY.Y > X, and not VX.VY.Y > X. 


There is a catch however: it might happen that, during unification (see the 
function unify above), a variable X with low level ¢ gets substituted with a 
type A containing variables of high level. In this case, for every variable in A, 
the level should be lowered to the minimum of this level and @ before performing 
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the substitution: the level gets “contaminated” by the one of the variable it is 
unified with. However, we are smart and see that the function occurs is already 
going through the type just before we substitute, and it is the only place where 
it is used, so that we can use it to both check the occurrence and update the 
levels. We therefore change it to 


let rec occurs x a = 
match unlink a with 
| EVar y when x = y -> raise Type_error 


| EVar y -> 
let 1 = tlevel y in 
let 1 = match !x with Unbd (_,1') -> min 1 1' | _ ->1 in 
y := Unbd (tname y, 1) 

| War _ -> CQ 


| TArr (a, b) -> occurs x a; occurs x b 


which changes the level of all variables to the minimum of their old level and 
the current level. Without this modification of occurs, for the term 


Auv.lety = Az.xziny 


we would infer the unsound type (X — Y) — (Z > W) instead of the expected 
type (X —> Y) > (X > Y),. 


4.4.5 Bidirectional type checking. We present here another approach to 
type checking, which does not try to come up with new or most general types: 
this means that we will fail to infer the type for terms when we are not certain 
about this type (for instance, if the term can have multiple types). However, we 
will try to exploit as much as possible the already known type information about 
terms. This is less powerful than previous methods in the context of A-calculus, 
but has the advantage of being simple to implement and of generalizing well to 
richer logics, where principal types do not exist or type inference is undecidable, 
see chapter 8. 

When implementing type inference, we can see that two different phases are 
actually involved: 


— type inference: we come up with a type for the term, 
— type checking: we make sure that the term has a given type. 


For instance, when performing the type inference for a term tu, we first infer 
the type for t, which should be of the form A — B, and then we check that u 
has type A. Of course, this checking part is usually done by inferring a type 
for u and comparing it with A, but in some situations we can exploit the fact 
that we are checking that the term t has type A, and that this A does bring 
us some information. For instance, we can check that the term Ax.x has the 
type X — X, but we cannot unambiguously infer a type for Ax.x because it 
admits multiple types (we have seen that there are canonical choices such as the 
principal type, see section 4.4.2, but here we do not want to make any choice 
for the user). 
This suggests splitting the usual typing judgment ['F t: A in two: 


—~TFt= A: we infer the type A for the term t in the context I, 


CHAPTER 4. SIMPLY TYPED \-CALCULUS 209 


—~TFt<A: we check that the term t has type A in the context [. 
We will consider terms of the form 
thus=a« | Art| tu (t: A) 


The only new construction is the last one, (¢ : A), which means “check that t 
has type A”. It will become handy since it allows bringing type information in 
terms and is already present in languages such as OCaml, where we can define 
the identity function on integers by 

let id = fun x -> (x : int) 


The rules for type inference and checking are the following ones: 


reese 

TFts>A->B TrucA a a ela 
Tees 8 m) Pe oa se: 
TFt<A (cast) aula ar b) 
TREO SAY Pepe 


They read as follows: 


(ax) If we know that x has type A then we can come up with a type for 2: 
namely A. 


(+) If we can infer a type A — B for t and check that u has type A then we 
can infer the type B for tu. 


(1) In order to check that Ax.t has type A > B, we should check that t has 
type B when z has type A. 


(cast) We can infer the type A for (t: A) provided that ¢ actually has type A. 


(sub) This subsumption rule states that, as last resort, if we do not know how 
to check that a term t has type A, we can go back to the old method of 
inferring a type for it and ensuring that this type is A. 


Note that there is no rule for inferring the type of Ax.t, because there is no way 
to come up with a type for x without type annotations. Again, this means that 
we cannot infer a type for the identity Ax.x, but we can in presence of type 
annotations: 


(sub) 


(1) 
FAvaeHAvA 
F (Ata: A> A)SA OA 


An implementation is provided in figure 4.6: the two modes (type inference 
and checking) are implemented by two mutually recursive functions (infer and 
check). There are two kind of errors that can be raised: Type_error means 
that the term is ill-typed as usual, and Cannot_infer means that the algorithm 
could not come up with a type, but the term might still be typable. 


(cast) 
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Example 4.4.5.1. For illustration purposes, we suppose we have access to real 
(or float) numbers, of type R, with the rule 


TrFr=R 


for every real number r. We also suppose that we have access to the usual 
mathematical functions (addition, multiplication), as well as a function which 
computes the mean of a function between two points, i.e. [ contains 


mean: (R—R)-~R-R-R 


We can then type 


T,c:Rt¥rc=>R 
T,c¢:Rrae<=R 


Tt mean > (R-R)>~R-R-R ThrArvcH=R-R Trr5S>R 
[TF mean (Av.z) > R>R-R TF5<R TF7=>R 
T+ mean (Av.2)5 > R—-R TF7<R 


T+ mean (Av.2)57 => R 


However, we cannot infer the type for the function 


Afry(fx + fy)/2 


which would be the definition of mean. When defining a function, we have to 
give its type and cast it accordingly: we can type 


(Afay.( fat fy)/2:(R>R)>R—-R-R) 


This is why in a programming language such as Agda you have to declare the 
type of a function when defining it: 


mean : (R*7R)*7RARAR 
mean f x y = (x + y) / 2 


Remark 4.4.5.2. If we omit the rule (cast), it is interesting to note that the 
terms uv such that [+ v = A and the terms n such that [ + n => A is derivable 
for some context [and type A are respectively generated by the grammars 


vis Aut |n ni=a|nv 
which is precisely the traditional definition of values (also called normal forms) 


and neutral terms (already encountered in section 3.5.2 for instance). 


4.5 Hilbert calculus and combinators 


We have seen in section 3.6.3 that every A-term can be expressed using ap- 
plication and the two combinators S and K, respectively corresponding to the 
A-terms 


S = Axyz.(xz) (yz) K = Ary.x 
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(**x Types. *) 

type ty = 
| TVar of string 
| TArr of ty * ty 


type var = string 


(xx Terms. *) 

type term = 
| Var of var 
| App of term * term 
| Abs of var * term 
| Cast of term * ty 


exception Cannot_infer 
exception Type_error 


(** Type inference. *) 
let rec infer env = function 
| Var x -> 


(try List.assoc x env with Not_found -> raise Type_error) 


| App (t, u) -> 
( 
match infer env t with 
| TArr (a, b) -> check env u a; b 
| _ -> raise Type_error 
) 
| Abs (x, t) -> raise Cannot_infer 
| Cast (t, a) -> check env t a; a 


(** Type checking. *) 
and check env t a = 
match t , a with 


| Abs (x, t) , TArr (a, b) -> check ((x, a)::env) t b 
| _ -> if infer env t <> a then raise Type_error 


Figure 4.6: Bidirectional type checking. 


211 
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We have also seen in example 4.4.2.5 that the principal types of those terms are 
respectively 


(X SY > Z)A(X SY) A XH 7 and X>Y OX 


which means that they respectively have the type 
(A> B>C)> (A> B)A> ASC and A>BAA 


for every types A, B and C. 

It is thus natural to consider a typed version of combinatory terms (see 
section 3.6.3) expressed by rules, where types and contexts are defined as above, 
and sequents are of the form 

TFt:A 


where [ is a context, t is a combinatory term and A is a type. The rules are 


TF a:T(a) ~) 


(S) 


TKS:(A>BOC)>— (A> B)A9 ARC 


(K) 
TFKK:A>S> BOA 


Trt:A-B Thku:A 
TFtu:B 


+B) 


where in (ax) we suppose that z € dom(T). If we apply an analogue of the term 
erasing procedure (section 4.1.7), we obtain the following logical system: 


PArra &) 


(S) 


TR(ASBSC)s3(ASB)3AS5C 


———___———_ (kK 
TEeASpou ' ) 


TFASB [TFA 
TFB 


(+8) 


which is precisely the Hilbert calculus described in section 2.7! In other words, 
in the same way that natural deduction corresponds, via the Curry-Howard cor- 
respondence, to simply typed A-calculus, Hilbert calculus corresponds to typed 
combinatory terms. This was first observed by Curry [CF58]. 
Example 4.5.0.1. We have seen in example 3.6.3.1 that, in combinatory logic, 
identity could be expressed as 

1=SKK 


Its typing derivation is 


(kK) 


FS5:(A> BOA) >A) > ASB OAD ASA”? on 


FSK:(A>BOA)AAGA 
-SKK:A>A 


KK:A>(B>A)7A 


(K) 
(+2) 


FKK:A>BOA 


from which, by term erasure, we recover the proof of A = A in Hilbert calculus 
given in example 2.7.1.1. 
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4.6 Classical logic 


Since classical logic is an extension of intuitionistic logic, in the sense that we 
have more rules, we can expect that the Curry-Howard correspondence can 
be extended to classical logic. For various reasons, it has been thought for a 
long time that classical logic had no computational contents, one being that 
naively imposing A < —=—7A makes all proofs of a given type equal, see sec- 
tion 2.5.4. It was thus somewhat of a surprise when Parigot introduced the 
Ap-calculus [Par92], which is an extension of the A-calculus suitable for classical 
logic. In section 2.5.9, we have analyzed the proof of —AV A (or rather its encod- 
ing in intuitionistic logic). The main ingredient is the ability to “roll back” to a 
previous proof goal at any point in the proof: we prove =A and at some point 
we change our mind, go back to proving =A V A, and chose proving A instead. 
In Ap-calculus, this is achieved by a sort of “exception” mechanism: instead 
of going on with the computation, we might raise an exception which is going 
to be caught and change the computation flow. However, in this calculus, the 
exceptions follow a very particular discipline, making them behave not exactly 
as in usual languages such as OCaml. 


4.6.1 Felleisen’s C. Let us try to naively try to extend the Curry-Howard 
correspondence to classical logic. For clarity, we write here L instead of 0 for 
the type corresponding to falsity. Starting from implicative intuitionistic natural 
deduction, classical logic can be obtained by adding the rule 


Th -anA 


ean 
corresponding to double negation elimination. This suggests that we should add 
a corresponding construction, say C(t), to our term calculus together with the 
typing rule 

[TKt:37A 


TEC(t): A cae) 


This calculus allows for “static” Curry-Howard correspondence, in a sense simi- 
lar to theorem 4.1.7.1: there is a bijective correspondence between typing deriva- 
tion of A-terms with C and proofs in natural deduction with double negation 
elimination. 

In order to hopefully extend this to a dynamical correspondence, we need 
to introduce a notion of reduction, which should correspond to cut elimination. 
First note that, unlike previously, here we do not need to add an introduction 
rule for double negation. Instead, recalling that ~A = A — L, we can construct 
a proof of =A from a proof of A as follows: 


TEt:A 

(ax) (wk) 

Tk: ADLER: AGL T,k:A>aLeEt:A 
T,k:ASDLERtiL 


Te ARMA RES (AS LOL 
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In other words, the introduction rule for double negation should be 


THKt:A 
ThE AK4.bt: 4A 


(471) 


We can therefore “guess” the reduction rule associated to C by observing the 
corresponding cut elimination: 


TT 
THt:A 
es are ey aa 7 
TEC(AkA.kt): A oe) Trt:A 


The 6-reduction rule should thus be 
C(AK*.kt) at 


Note that this rule only makes sense when k does not occur in t, otherwise the 
bound variable k could escape its scope... This is indeed the main reduction 
rule associated to C, but it turns out that two more reduction rules are required 
for C: 


C(Ak A. kt) et if k ¢ FV(t) 
C(AK*A>®) £) u —og C(AR7? [A f47Fk (f u) /k]) 
C(AK A.B C(AR'"4.4)) 99 C(ARY A .t[k” /k, k” /k’]) 


The second rule states that the application to the argument u goes through 
under C: if our calculus had products or coproducts, similar rules would have 
to be added in order to enforce their compatibility with C. The third rule states 
that we can merge two uses of C on the same type. 

Let us try to understand what this could mean. Suppose given a term v 
of type —7A. Since -7=A = (A > L) — L, this means that v must be an 
abstraction taking an argument k of type A > | and return a value of type , 
i.e. v will reduce to a term of the form \k4~+.u. Since there is no introduction 
rule for L (there is no way of directly constructing a term of type L), at some 
point during the evaluation of u, it must apply k to some argument t of type A 
in order to produce the value of type L, i.e. v will reduce to \k47+.kt. Thus, 
C(v) will reduce to C(Ak4>+.k t), which will reduce to t. Reduction path is thus 


C(v) 39 COR 4 .u) 9 CARA. kt) gt 


This means that C(v) waits for v to apply its argument k to some term t of 
type A and returns this argument t. The term k can thus be thought of as an 
analogue of return in some languages such as C, or maybe also as the raise 
operator of OCaml which raises exceptions (more on this later on). However, 
things are more subtle here because the returned term might itself use some of 
the terms computed during the evaluation of v. In order to see that in action, 
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let us compute the term associated to the usual proof of —A V A (see page 63): 


(ax) 


k:A(nAVA),a:AFa:A 

k:A(7AV A),a: Ak u(a):AAVA 
k:A(n7AV A),a: Ak ke(a): 
k: (AAV A)b Aa4.k1,(a) 2A 

k:7A(AAV A) bk y(Aa4.k1,(a)): AV A 
k: (nA V A) ky (Aa4.kt,(a)) 1 

b AKA. Ky (Aa4.kt,(a)) : -7(4A V A) 
t C(AK 7A. ku (Aa4.k1,(a))) : AVA 


(V1) 


As indicated above, the term C(Ak74.k 1, (Aa4.ku,(a))) cannot reasonably reduce 
to 
t = 4(ra“.k1,(a)) 


because the variable k occurs in t. The additional rules make it so that it 
however acts as t, i.e. it states that it is a proof of aA = A > 1, albeit being 
surrounded by C(Ak~4.k...). If, at some point, we use this proof and apply it 
to some argument wu of type A, the term will thus reduce to 


C(AK™A.k 1,(u)) 


which in turn will reduce to u,(u) by the reduction rule associated to C. It 
thus fakes being a proof of =A until we actually use this proof and apply it to 
some argument u of type A, at which point it changes its mind and declares 
that it was actually a proof of A, namely u. This is exactly the behavior we 
were describing in section 2.5.2, when explaining that classical logic allows to 
“resetting proofs”. 


Variants of the calculus. The operator C is due to Felleisen [FH92] and the 
observation that it could be typed by double negation elimination was first 
made by Griffin [Gri89], see also [SU06, chapter 7]. Many small variations of 
the calculus are possible. First note that we could add C (as opposed to C(t)) 
as a constant to the language, which corresponds to adding double negation 
elimination as an axiom instead of a rule: 


TEKC:37-AS A 


If we instead use Clavius’ law instead of double negation, see theorem 2.5.1.1, 
then we would have defined an operator cc called callcc (for call with current 
continuation): 


TkFee:(-A> A) A 


This operator is implemented in languages such as Scheme and C is a general- 
ization of it: we have cc(\k.t) = C(Ak.kt). Finally, double negation elimination 
can also be implemented by the rule 


T,nAFL 
TEA 
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which suggests the following variant of the calculus 


T,a:7AbFt: 1 
Te yatt:A 


This means that we now add a construction px24.t to our terms which corre- 
sponds to 
pot t = C(Aa4 4) 


in the previous calculus. In the next section, we will see an alternative calculus 
based on similar ideas, though with nicer and more intuitive reduction rules. 


4.6.2 The \p-calculus. Let us now introduce the Ap-calculus [Par92]. We 
suppose fixed two sorts of variables: the term variables x, y, etc. which behave 
as usual and the control variables a, 8, etc. which can be thought of as variables 
of negated types. The terms are generated by the grammar 


thus=a|tu| dat | ua.t | [alt 


The first constructions are the usual ones from the A-calculus. A term of the 
form pa.t should be thought of as a term catching an exception named a and a 
term [a]t as raising the exception a with the value t. The reduction will make 
it so that the place where it is caught is replaced by ¢. For instance, we will 
have a reduction 
t (wa.u ([ajv)) > tv 
meaning that during the evaluation of the argument of t, the exception a will 
be raised with value v and will thus replace the term at the corresponding pa. 
The constructor ys is a binder and terms are considered modulo a-equivalence: 
pa.t = wB.(t[B/a]). Beware of the unfortunate similarity in notation between 
raising and substitution. 
The three reduction rules of the calculus are 


— the usual 6-reduction: 
(Ax.t)u —> tlu/z] 


— the following rule commuting applications and p-abstractions: 
(wa.t) u —>g 18.t[[B]—u/[a]—] 


where the weird notation [6]—u/[a]— in the substitution means that we 
should replace every subterm of t of the form [a]v by [S]vu, 


— the following reduction rule for pu, stating that if we catch exceptions raised 
on @ and immediately re-raise on 6, we might as well raise them directly 


on (3: 
[B](ua.t) —+ t[8/o,] 


Additionally, we require the following 7-reduction rule, stating that if we catch 
on @ and immediately re-raise on a, we might as well do nothing: 


pia.[alt —>, t 


when a ¢ FV(t). It is proved in [Par92] that 
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Theorem 4.6.2.1. The Ap-calculus is confluent. 


Remark 4.6.2.2. The translation between previous calculus based on C and 
Ap-calculus was already hinted at at the end of previous section: jia.t cor- 
responds to C(Aa.t) and [a] corresponds to applying the argument given by C. 
More formally, the operators cc and C can be encoded in the Ap-calculus as 


ce = Ay.pa.[al(y (Ar.u8.[y]z)) 
C = Ay.pa.[B\(y (Ax-py.[a]x)) 


The intuition is thus that wa.t corresponds to some sort of OCaml construc- 
tion creating an exception and catching it: 


let exception Alpha of ‘a in 
try t with Alpha u -> u 


and [a]u would correspond to raising the exception: 
raise (Alpha u) 


However, there are differences. First, the name of the exception is generated on 
the fly instead of being hard-coded: we have an a-conversion rule for yw binders. 
More importantly, the exceptions can never escape their scope in Ay-calculus, 
unlike in OCaml. For instance, consider the following program in OCaml: 


let f : int -> int = 
let exception Alpha of (int -> int) in 
try fun n -> raise (Alpha (fun x -> n * x)) 
with Alpha g -> g 


let © = print_int (f 3) 


Although the exception Alpha seems to be caught (the raise is surrounded by 
a try / catch), executing the program results in 


Fatal error: exception Alpha(_) 


meaning that it was not the case: when executing f 3, f is replaced by its value 
and the reduction raises the exception. The analogue of this program in Aj is 


f = pa.[a](An.[a] (Aun x x)) 


(we allow ourselves to use integers and multiplication). It does not suffer from 
this problem, and corresponds to a function which, when applied to an argu- 
ment n, turns into the function which multiplies by n. When we apply it to 3, 
it thus turns into the function which multiplies its argument (which is 3) by 3 
and the result will actually be 9 as expected: 


f3 — uw [B](An.[6] (Aan x x)3)3 
— 1B [B][EB]A®.3 x x)3 
— 18.[B][2](3 x 3) 


which is 7-equivalent to 3 x 3 using the two 7-conversion rules. 

Another possible interpretation is that pa.t stores the current evaluation 
context and [a]u restores the evaluation context of the corresponding jsa before 
executing u: it is as if the term t had never been executed. In the above example, 
it is as if f had directly been defined as Ax.n x «x. 
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4.6.3 Classical logic as a typing system. In order to type the Au-calculus, 
we consider types of the form 


A,Bzu=X|A>BIL 


We also consider a Church variant of the calculus, where A- and p-abstracted 
variables are decorated by their types. The sequents are of the form 


THKt:A|A 
with t a term, A a type, and [and A contexts of the form 
PH 070 Bisse: ti Ban A=ay,: Aj,...,Qn: An 


where the variables of I’ are regular ones, whereas those of A are control vari- 
ables. Namely, [ provides the type of the free variables of ¢ as usual, whereas A 
gives the type of exceptions that might be raised. Finally, A is the type of the 
result of t, which might never be actually given if some exception is raised. In 
particular, a term of type L is called a command: we know that it will never 
return a value, and thus necessarily raises some exception. 

The typing rules for Ay-calculus are 


Paar Po ecaA 
TFt:A>BIA eee, T,a:AFt: BIA ts 
> > 
TFtu: BJA . ThAa42:A>3Bl\A : 
et el aA Trt: A|A,a: A,d’ 
tr) (Lr) 


TE poat.t: A|A, A’ T+ fal: L|A,a:A,A’ 


The rule (1) says that a term pa“.t of type A is a command which raises some 
value of type A on a and the rule (13) says that a term [a]t is a command (of 
type L, not returning anything) and that the type A of the raised term t has 
to match the one expected for a. 


Exercise 4.6.3.1. Show that the Pierce’s law 
(A> B)7>~ A)“ A 
is the type of the following term: 
AA7B)4 nod [al(a (Ay*.u8? [aly)) 


It can be shown [Par92, Par97] that this system has the expected properties 
which were detailed above for the case of simply-typed A-calculus: 
Theorem 4.6.3.2 (Subject reduction). If! ¢: A|A is derivable and t —+, t’ 
then [' t’: A|A is also derivable. 
Theorem 4.6.3.3 (Strong normalization). Typed Ay-terms are strongly normal- 
izing. 
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If we erase the terms from the rules, we obtain the following presentation of 
classical logic: 


TAPPrAA®) 
ThAS BA TEAA [TArPBA 
PEB,A (+2) TrASBAo 
PRLA AA ThA AAA’ 
TE A,A, A’ (18) TF LA,A, A" 1) 


All the rules are the usual ones except for the rule (Ly) which combines weak- 
ening, contraction and exchange: 


TH A,A, A, A’ 
TEA,A,A,A’ 
TEA,A,A’ 
FreLAaAa 


(xch) 


The list of formulas in A (nor I) is not supposed to be commutative, and intro- 
duction and elimination rules always operate on the leftmost formula. During 
proof search we can however put another formula of A on the left using elimina- 
tion and introduction rules for L, as shown on the left (and the corresponding 
typing derivation is figured on the right): 


[+t B,A,A, B,A’ i Trt: Bla: A,A, 6: BA’ 2 
FrLAA BA ) Tr [6lt: L|A,A,B,A (14) 

p(s) x ; (Le) 
TE A,A,B,A TE poA [B]t: AA, 8: BA 


Adding the usual rules for coproducts, we can show the excluded middle as 
follows in this settings 


x:Abkau:Ala: SAGA ss 

a: AkuyA(2):nAVAla:7AVA 

a: Ak [ajiy4(2):Lla:nAVA 
t ra falirA (2): Ala: 7AVA 

FiO" lale"(@)) AV Bla: sAVA 

F [alfa fale *@) lle: sAVA 

por fala’ fale 4(a)) AV Al 


(Vi) 
(11) 
(+1) 
(Vi) 
(11) 
(Lz) 


L 


In order to give a more concrete idea of this program, let us try to implement it in 
OCaml. Remember from section 1.5 that the empty type can be implemented 
as 


type bot 


and negation as 
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type 'a neg = 'a -> bot 


From those, the above term proving excluded middle can roughly be translated 
as 


let em () : (a neg, a) sum = 
let exception Alpha of (a neg, a) sum in 
try Left (fun x -> raise (Alpha (Right x))) 
with Alpha x -> x 


As explained above, this does not behave exactly as it should in OCaml, because 
exceptions are not properly scoped there... 


4.6.4 A more symmetric calculus. The reduction rule for (a.t) u in the 
Ap-calculus involves a slightly awkward substitution. In order to overcome this 
defect and reveal the symmetry of terms and environments, Curien and Herbelin 
have introduced a variant of the Ay-calculus called the Ayji-calculus [CHO0}. In 
this calculus there are three kinds of “terms”: 


terms: t= a | Aut | pac 
environments: en=a|t-e| fia.c 
commands: c:= (E | e) 


with reduction rules 


(Ax.t | w-e) —> (u | fax.(t | e)) 
(ua.c | e) —> cle/a] 
(t | fax.c) —> c[t/a] 


The typing judgments are of the three possible forms 


Tht: A|A Tle: AFA c:(TFA) 
and the rules are 

ep Taree We ae Cre apa 

Trt: A|A Tle:BFA T,a:AFt: BIA 
eee a \) TP \et Aa BAC? 

c:(T,4: AFA ce:(TFa:A,A 
. My) doe) 

[T|fac: AFA Th pac: AJA 


Tht: A|A Tle:AFA 
(t |e): (TFA) 


You are strongly encouraged to observe their beautiful symmetry and find out 

their meaning by yourself. In particular, Lafont’s critical pair presented in 

section 2.5.4 corresponds to the fact that the following term can reduce in two 

different ways, showing that the calculus is not confluent (for good reasons!): 
clfia.d/a] <— (ua.c | jix.d) — dl[pa.c/z] 


In particular, if a is not free in c and z is not free in d, c and d are convertible... 


CHAPTER 5 


First-order logic 


First-order logic is an extension of propositional logic where propositions are 
allowed to depend on terms over some fixed signature, and are then called 
predicates. For instance, equality can be encoded as a predicate t = u which 
depends on two terms ¢t and u. There are thus two worlds in play: the world of 
logic, where formulas live, and the world of data, where terms live. This logic is 
the one traditionally considered in mathematics (in particular, we will see that 
it can be used to formally state the axioms of set theory). Good introductions 
on the subject include [CK90, CL93]. 

We define first-order logic in section 5.1, present some well-known first-order 
theories in section 5.2, and detail the particular case of set theory in section 5.3 
(including in the intuitionistic setting). Finally, the first-order unification algo- 
rithm is presented in section 5.4. 


5.1 Definition 


5.1.1 Signature. A signature % is a set of function symbols together with a 
function a: 4 > N associating an arity to each symbol: f can be thought of as 
a formal operation with a(f) inputs. In particular, symbols of arity 0 are called 
constants. 


5.1.2 Terms. We suppose fixed an infinite countable set ¥V of variables. Given 
a signature 4, the set Jy of terms is the smallest set such that 


— every variable is a term: ¥ C Ty, 


— terms are closed under operations: given f € © with a(f) = n and 
t1,...,tn € Ty, we have f(t1,...,tn) € Ts. 


This can also be stated as the fact that terms are generated by the grammar 
$30) F bijecey ty) 


where x is a variable, f is a term of arity n and the ¢; are terms. We often 
implicitly suppose fixed a signature and simply write 7 instead of Ty. 


Example 5.1.2.1. Consider the signature © = {+ : 2,0: 0}. This notation means 
that it contains two functions symbols + and 0, whose arities are respectively 
a(+) =2 and a(0) = 0. Examples of terms over this signature are 


+(2,0()) + (+(@, 2), +(y,00)) + (00,00) 


In the following, we generally omit parenthesis for constants, e.g. write 0 instead 
of 0(). 


CHAPTER 5. FIRST-ORDER LOGIC 222 


Given a term t, its set of subterms ST(¢) is defined by induction on t by 
ST(ax) = {a} 
STG Cie ate) HA fyi [ST a) 
l<i<n 


We say that u is a subterm of t when u € ST(t), it is a strict subterm when it 
is distinct from t. 


5.1.3 Substitutions. A substitution is a function 0 : X + T such that the set 
{x € X | o(a) 4 x} is finite. Given a term t, we write t[c] for the term obtained 
from t by replacing every variable x by o(2): 


a[o] = o(2) (f(t,---,t))lo] = Fhe], --- tnlo}) 


We sometimes write o = [t1/11,...,tn/Xp] for the substitution such that o(x;) = 
and o(a) = x for « 4 a; for every 1 <i<n. A renaming is a substitution such 
that the term o() is a variable, for every variable x. 


5.1.4 Formulas. We suppose fixed a set P of predicates (also sometimes called 
relation symbols) together with a function a : P —+ N associating an arity to 
each predicate. A formula A is an expression generated by the grammar 


A,B::= P(t,...,tn)| AS>B|AAB|T|AVB|L|7A|Va2.A|sar.A 


where P is a predicate of arity n, the t; are terms, x € ¥ is a term variable and 
A and B are formulas. The quantifications bind the less tightly, e.g. Vz.A/ B is 
implicitly bracketed as Va.(A A B) and not (Vz.A) A B. Note that the definition 
of formulas depends both on the considered signature © and the considered 
set P of predicates: we sometimes say a formula on (%,P) to make this precise, 
although we generally leave it implicit. 

Example 5.1.4.1. Consider the signature © = {x : 2,1 : 0}, which means that 
we have two function symbols “x” and “1”, with x of arity 2 and 1 of arity 0. 
We also suppose that P contains a predicate = of arity 2. We have the formula 


Vo.dy(axy=1lAyxa=1) 
which expresses that every element admits an inverse. 
Example 5.1.4.2. With a predicate D of arity one, the drinker formula is 


Jx.(D(x) > (Vy.D(y))) 


The name of this formula comes from the following interpretation. If we see 
terms as people in a pub and consider that D(t) holds when ¢ drinks, it can be 
read as: 


There is someone in the pub such that, 
if he is drinking, then everyone in the pub is drinking. 


We will see in example 5.1.7.1 that this formula is classically true, but that it 
is not intuitionistically so. 

First order logic is an extension of propositional logic in the following sense. 
Consider the empty signature © = @ and the set P = ¥ consisting of all proposi- 
tional variables, seen as predicates of arity 0. Then a propositional formula, such 
as X V AY, corresponds precisely to a first order formula, such as X() V AY(). 


+ 
s. 
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5.1.5 Bound and free variables. In a formula of the form Vx.A or 4xz.A, the 
variable x is said to be bound in A. This means that the name of the variable x 
does not really matter and we could have renamed it to some other variable 
name, without changing the formula. We thus implicitly consider formulas up 
to proper (or capture avoiding) renaming of variables (by “proper”, we mean 
here that we should take care of not renaming a variable to some already bound 
variable name). For instance, we consider that the two formulas 


Ve.dya+y=2 and Vz.dy.z+y=2 


are the same (the second is obtained from the first by renaming x to z), but 
they are different from the formula 


Va.de.ac+xe=ex 


obtained by an “improper” renaming of y into « which was already bound. 
Such a mechanism for renaming bound variables is detailed in section 3.1, for 
the A-calculus. 

A variable which is not bound is said to be free and we write FV(A) for the 
set of free variables of a formula A. This is formally defined by 


FV(Va.A) =FV(ar.A 


where, given a term t, we write F'V(t) for the set of all the variables occurring 
in t. A formula A is closed when it has no free variables, i.e. FV(A) = 0. We 
sometimes write 

A(@1,.--,;%n) 


for a formula A whose free variables are among 21,...,2,,. In this case, we write 
A(ti,...,tn) instead of A[ti/r1,...,tn/&n]. 

Given a formula A and a term t, we write A[t/z] for the formula A where all 
the free occurrences of x have been substituted by t avoiding captures, i.e. we 
suppose that all bound variables are different from the variables of t. For in- 
stance, with A being 


(dy.c+a=y)V (Aaa = y) 
we have that A[z+ z/a] is 


(Ay.(2+ 2) 4+ (¢+2) =y) V (Aa.x = y) 


but in order to compute Aly + y/az], we have to rename the bound variable y 
(say, to z) and the result will be 


(dz(y+y)+(y+y) =2) Vv (Arr = y) 


and not 


(dy.(yty)+(y+y) =y) Vv Ga.x = y) 
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5.1.6 Natural deduction rules. The rules for first order logic in intuitionistic 
natural deduction are the usual ones (see figure 2.1) together with the following 
introduction and elimination rules for universal and existential quantification: 


Th V2.A Sse ae 

TF Aft/z] we Feyea 

Tb aax.A T,AFB _ Tt Alt/a] _ 
TFB ae fea 


These rules are subject to the following (important) side conditions: 
— in (Vz), we suppose z ¢ FV(T), 
— in (dg), we suppose x ¢ FV(T) UFV(B). 


where, given a context [ = 2: Aj,...,2%n : An, we have 


Example 5.1.6.1. We have 


Tae Tae We Way a 


Va.4A,dxz.A,Ab AA Va.cA,dz.A,Abr A 
Sy Br (aT ae Ve.7A,42.A, AFL Ce) 
Va.7A,ac.AF ae) 
Va.4A k 7a(Ax.A) 


F (Va.4A) = 7(dax.A) 


Remark 5.1.6.2. The side conditions avoid clearly problematic proofs such as 


(ax) 


A(a) - A(x) (v1) 


A(x) b Va. A(x) 
F A(x) > Vax.A(a) 
+ Vax.(A(a) = Va.A(x)) 
F A(t) > Vax.A(ar) 


1) 
(Vi) 
(VE) 


which can be read as: if the formula A holds for some term t then it holds for 
any term. The problematic rule is the (Vr) just after the (ax) rule: x is not 
fresh. 


Properties of the calculus. We do not detail this here, but the usual properties 
of natural deduction generalize to first order logic. In particular, the structural 
rules (contraction, exchange, weakening) are admissible, see section 2.2.7. We 
will also see in section 5.1.9 that cuts can be eliminated. 
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5.1.7 Classical first order logic. Following section 2.5, classical first order 
logic, is the system obtained from the above one by adding one of the following 
rules 


PEssA | Weey eae 


] Se —————— 
len) TEA e TEA 


(raa) 


TFAAVA 


implementing the excluded middle, the elimination of double negation or Clav- 
ius’ law (we could also have added any of the axioms of theorem 2.5.1.1). 
Example 5.1.7.1. A typical formula which is provable in classical logic (and not 
in intuitionistic logic) is 


A= 4zx.(D(x) = (Vy.D(y))) 


already presented in example 5.1.4.2. A proof is the following: 


~~ =D@)F=D@) Dar Dw) Kae 
AA, D(x), =D(y), Diy) - 1 (Le) 
=A, D(x), =D(y), D(y) F Vy-D(y) ae 
SA, D(a), =D(y) F Dy) > (¥y-D(y)) = 
74, D(a), Diy) F 32(D@) > Wy. DW) 


3A, D(x), =D(y) FL 


=A, D(x) F -D(y) : 
5A, D(x) F D(y) (v1) 
=A, D(x) F Vy.D(y) es 
TAF D(x) = (Wy-D(y)) (a) 
TAF S2.(D(x) > (Vy-D(y))) 


F dr.(D(x) > (vy-D(y))) 


If we interpret x as ranging over the people present in a pub, and the predi- 
cate D(x) as “x drinks” this formula states that there is a “universal drinker”, 
i.e. somebody such that if he drinks then everybody drinks. We can imagine 
why this formula cannot be proved intuitionistically: if it was so, we should 
be able to come up with an explicit name for this guy, see theorem 5.1.9.3, 
which seems impossible in absence of further information on the pub. We do 
not actually prove that there exists x such that D(x), which would require us to 
come up with an explicit witness for x, but only show that it cannot be the case 
that there is no x satisfying D, which is enough to conclude by double negation 
elimination. 


Example 5.1.7.2. Another formula provable in classical logic is the formula 


a(Va.nA(x)) => dar.A(x) 


which states that if it is not the case that every element x does not satisfy A(x), 
then we can actually produce an element which satisfies A(x). It can be proved 
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as follows: 
F At 
..F a5a@.A(x) oo ...F da. A(x) ay 
AW2x.7A(x), 7ax.A(x), A(x) FL ae) 
AV2x.7A(x), 73x.A(x) F 4 A(x) a) 
+ Wa.4 A(z) ey Ve.7A(x), 7dx.A(x) k Vr.4A(x) wn) 
Wa.4A(2),7da.A(x) FL ae 


Ve.1A(x) I ae.A(zx) 
Vae.4A(ar) k da.A(x) 
+ (Vx. A(x)) > da.A(x) 


As in example 5.1.7.1, it is enough to show that it is not the case that there is 
no « satisfying A. 
Exercise 5.1.7.3. Another proof for the drinker formula of example 5.1.7.1 is the 
following. We have two possibilities for the pub: 
— either everybody drinks: in this case, we can take anybody as the universal 
drinker, 


— otherwise, there is someone who does not drink: we can take him as 
universal drinker. 


Formalize this reasoning in natural deduction. 


De Morgan laws. In addition to the equivalences already shown in section 2.5.5, 
the following de Morgan laws hold in classical first-order logic: 


(Va1.A)A Be Va.(AA B) BA (Va.A) = Va.(B A A) 

(Vz.A) VBS Va.(AV B) BY (Wa.A) = Vau.(B V A) 
(Va.A) > BS aAr.(A=> B) => (Va.A) © Va.(B = A) 

(ar.A)A\ Bs dlar(AA B) A (au.A) & dx.(B A A) 

(ar.A)V BS dar.(AV B) V (ax.A) & dx.(B Vv A) 
(ar.A) > BSVu.(A=> B) => (ax.A) © dr.(B = A) 

whenever x ¢ FV(B), as well as 
a(Va.A) @ da.nA a(ax.A) @ Var.nA 


Prenex form. A formula P is in prenex form when it is of the form 


P:=Va.P | dxa.P|A 


where A is a formula which does not contain any first-order quantification: a 
formula in prenex form thus consists in a bunch of universal and existential 
quantifications over a formula without quantifications. By using the above de 
Morgan laws from left to right, one can show that 


Lemma 5.1.7.4. Every formula is equivalent to a formula in prenex form. 
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Example 5.1.7.5. The formula of example 5.1.7.2 can be put into prenex form 
as follows: 


a(Va.nA(x)) > Jx.A(x) 


$3 
nat 
1 

el 
= 
~s8 
Wes 
u 
WwW 


More de Morgan laws. In addition to the above equivalences, we also have 


Va.(A A B) & (Va.A) A (Va.B) da.(AV B) > (Aw.A) V (Az.B) 
Va. eT del o 1 


5.1.8 Sequent calculus rules. The rules for first-order quantifiers in classical 
sequent calculus are 


T,Va.A, A[t/a] A v TFKA,A . 
T,V2.AF A (VL) Tryna A®) 
T,AFA (a) oes eibat ) 
Parara TearAA  ~* 


a 


with the side condition for (VR) and (S,) that x ¢g FV([)UFV(A). Intuitionistic 
rules are obtained, as usual, by restricting to sequents with one formula on the 
right: 


T,Va.A, A[t/2] + B v TFA y 
T,Vz.AF B (VL) Trvea®) 
eat ae Db Alt/2] _ 
Ta2Ar Bp Trae 2 


with the expected side conditions for (Vp) a,d (AL). 


Remark 5.1.8.1. In the rules (Vz) and (dr), we have been careful to keep a copy 
of the hypothesis: with this formulation, contraction is admissible. 


Example 5.1.8.2. The drinker formula from example 5.1.4.2 can be proved clas- 
sically by 


As noted in the previous remark, we need to use the proved formula twice, and 
it is thus crucial that we keep a copy of it in the rule (4g) at the bottom. 
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5.1.9 Cut elimination. The properties and proof techniques developed in sec- 
tion 2.3 extend to first order natural deduction, allowing to prove that it has 
the cut elimination property: 


Theorem 5.1.9.1. A sequent [+ A admits a proof if and only if it admits a cut 
free proof. 


In the cut elimination procedure, there are two new cases, which can be handled 
as follows: 


TE A(a) (v1) 
Tb Va.A(a) (Vn) a(t /a] 
[+ A(t) [+ A(t) 
st AY pies soon 
[+ Ja.A(a) T, A(z) B Gr) mw [t/a] [7/Al] 
TFB TFB 


Above, 7[t/x] stands for the proof 7 where all the free occurrences of the vari- 
able a have been replaced by the term t (details left to the reader). As in the 
case of propositional logic, it can be shown that a proof of a formula in an 
empty context necessarily ends with an introduction rule (proposition 2.3.3.2) 
and thus deduce (as in theorem 2.3.4.2): 


Theorem 5.1.9.2 (Consistency). First order (intuitionistic or classical) natural 
deduction is consistent: there is no proof of F L. 


Another important consequence is that that the logic has the existence property: 
if we can prove that there exists a term satisfying some property, then we can 
actually construct such a term: 


Theorem 5.1.9.3 (Existence property). A formula of the form dx.A is provable 
in intuitionistic first order natural deduction if and only if there exists a term t 
such that A[t/z] is provable. 


Proof. For the left-to-right implication, if we have a proof of 4x.A then, by 
theorem 5.1.9.1, we have a cut-free one which, by proposition 2.3.3.2, ends with 
an introduction rule, i.e. is of the form 


T 


F Alt/a] 
F dax.A 


(Ar) 


We therefore have a proof 7 of A[t/a] for some term t. The right-to-left impli- 
cation is given by an application of the rule (Ar). 


WN 


In contrast, we do not expect this property to hold in classical logic. For in- 
stance, consider the drinker formula of example 5.1.7.2. We can feel that the 
proof we have given is not constructive: there is no way of determining who is 
the drinker in general (i.e. without performing a reasoning specific to the bar in 
which we currently are). 
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5.1.10 Eigenvariables. The logic as we have presented it (which is the way it 
is traditionally presented) suffers from a defect: in a given sequent, we do not 
know the first order variables which are used. This is the subtle cause of some 
surprising proofs. For instance, we can prove that there always exists a term, 
whereas we would expect that the empty set is a perfectly reasonable way of 
interpreting logic in the case where the signature is empty for instance: 


—= (T1) 


E hy 
F da.7 (21) 


Note that in the premise of the (ay), we use the fact that T = T[a/a], ie. we 
use x as witness for the existence. A variation on the previous example is the 
following proof, which expresses the fact that if a property A is satisfied for 
every term x, then we can exhibit a term satisfying A. Again, we would have 
expected that this is not true if there is no element in the model, and moreover, 
this does not feel very constructive: 


(ax) 


(VE) 


Va.Ab Va.A 
Va.Ab A 
Va.Al dr.A 
F (VWa.A) > da.A 


31) 


1) 


Here also, in the premise of the (4;) rule, we use the fact that A = A[z/z], 
i.e. we use x as witness for the existence. We will see in section 5.2.3 that this 
is the reason why models are usually supposed to be non-empty, while there is 
no good reason to exclude this particular case. 

In order to fix that, we should keep track of the variables which are declared 
in the context, which are sometimes called eigenvariables. This can be done by 
adding a new context = to our sequents, which is a list of first order variables 
which are declared. We thus consider sequents of the form 


S|PRA 


the vertical bar being there to mark the delimitation between the context of 
eigenvariables and the traditional context. The rules for logical connectives 
simply “propagate” the new context =, e.g. the rules for conjunction become 


E|TFAAB , E|TKAAB S|TFA S/TRB 


A Ak A 
=|/rFA (Az) S|rFB (Az) E|PTFAAB a 


More interestingly, the rules for first order quantifiers become 


E|TFVa.A E,c|TrA . 
=|PF Alt/z] °~ E(rrvea 
=|Tt Aa.A Bv|/T,AFB _ E|TE Alt/r] _ 
(dz) =a Hes Pee 

=S|TEB =|PRa2.A 


where we suppose 
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— @ a = in (V1) and (de), 


— FV(t) Cc = in (VE and (Ar). 


wa 


Finally, the axiom rule and the truth introduction rule become 


S|P, A,’ B|Pb 


aes ee ee es ee ee foe 
=rnarea syred 


where = |I' + is a notation to mean that we suppose FV([) C ©. Supposing 
this for these two rules (which are the only two without premise) is enough 
to ensure that whenever we prove a sequent = |I + A, we will always have 
FV(T) UFV(A) C & (it is easy to check that the inference rules preserve this 
invariant). 


Example 5.1.10.1. We can still prove (Vz.A) = Vz.A in this new system: 


av |Vz.Ab V2.4 ; 
v|Ve.AF Alx/z] : 
\Va.AF Va.A (v1) 


| F (Va.A) > Va.A 1 


Example 5.1.10.2. We cannot prove 4z.T in this system. In particular, the proof 


(T1) 


b Tla/a 
FTP. 


| F Sa.T 


is not valid because the side condition is not satisfied for the rule (Ar). 


Exercise 5.1.10.3. Show that the formula (Vxz.L) = L is provable with tradi- 
tional rules, but not with the rules presented in this section. 


5.1.11 Curry-Howard. The Curry-Howard correspondence can be extended 
to first-order logic, following the intuition that 


— a proof of Vz.A should be a function which, when applied to a term ft, 
returns a proof that A is valid for this term, 


— a proof of 4z.A should be a pair consisting of a term ¢ and a proof that 
A is valid for this term. 


Expressions. We begin with the language for proofs introduced in chapter 4, the 
simply typed A-calculus. In this section, we call its terms expressions in order 
not to confuse them with first order terms, and write e for an expression. The 
syntax for expressions is thus 


e,e n= da4.e| ee |... 


In order to account for first order logic, we extend expressions with the following 
constructions: 


en=... | Nae | et | (t,e) | unpair(e, zy 6 e’) 


The newly added constructions are 
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— Nz.e: a function taking a term as argument x and returning an expres- 
sion e, 


— et: the application of an expression (typically a function as above) to a 
term ft, 


— (t,e): a pair consisting of a term ¢ and an expression e, 


— unpair(e, ry +> e’): the extraction of the components x and y of a pair e 
for use in an expression e’, which would be written in a syntax closer to 
OCaml 

let (x,y) =e ine’ 


We insist on the fact that there are two kinds of abstractions, respectively writ- 
ten and X, and two kind of applications, which are distinct constructions. 
Similarly, for products, there are two kinds of pairings and of eliminators. Al- 
though they behave similarly, they are entirely distinct constructions. However, 
we will be able to unify those constructions when going to dependent types in 
chapter 8: there will be one kind of abstraction (resp. pairing) which covers 
both cases. 


Typing rules. The associated typing rules are 


Tre:Va.A (Vs) Tke:A (v1) 

Tret: Alt/a] . TH ¥a.e:Vz.A°* 
Tke:4da.A Ty Are: Bee Pre: Aft/a] 
I'+ unpair(e, ry > e’): B (3) TF (t,e) : da.A v 


and can be read as follows: 


— (Vr): a proof of Va.A is a function which takes a term «x as argument and 
returns a proof of A, 


— (Vg): using a proof of Vxz.A consists in applying it to a term t, 


— (Sy): a proof of 3x.A(x) is a pair consisting of a term t and a proof that 
A(t) is satisfied, 


— (Sp): we can use a proof of dx.A by extracting its components. 


Example 5.1.11.1. Consider again the derivation of example 5.1.6.1. It can be 
decorated with expressions as follows: 


f :VarA,e:4r.Abte:4xr.A ey) 
f:VarA,e:da.Aja: At f:Va.nA 7 
f:Va7rA,e:da.Aja: Ab fa:7AA ve) 
(ax) 


f :VarA,e:da.A,a:Ara:A 
f:VarA,e:dxr.A,a: Ak fra: 
f :Va.7cA,e:4da.At unpair(e,rar fra): 1 
f :Ve7Ab Ae?*4. unpair(e, ra f xa) : 7(Sa.A) 
b Afi A deF”-4 unpair(e, cat f xa) : (V2.4A) > 7(Sz.A) 
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The corresponding proof term is thus 
Ape ee ts 
This function takes two arguments: 
— f of type Vz.A > L, and 
v.A 


— eof type J 


and produces a value of type L by extracting 
of A(t), and applying f to x and a. 


232 


-unpair(e,za> faa) 


from e a term x and a proof a 


Reduction. As usual, the 3-reduction rules correspond to cut-elimination steps: 


7 
Trke:A (v1) 
THN z.e:V2.A % 
TE (Nz.e)t: Alt/z] °” 
ie. 
(Yr.e)t 
and 
7 
Tre: Alt/a] W 
Tk (t,e) : da.A : Dy ArPe <B | 
I'+ unpair((t,e), zy e’): B a 
i.e. 


unpair((t, e), ry > 


a {t/a 
Tk e[t/a] : A[t/a] 


elt /a] 


m'[t/2][n/A] 
Pe [t/a fy) 0B 


e) —+e é[t/x,e/y] 


where z[t/z] is the proof obtained from 7 by replacing all free occurrences 


of x by t (details left to the reader). Similarly, 
eliminate dual of cuts: 


1 
Tke:Va.A 7 
Trea: Ala/a] Ve) 
7 (V1) cd 
ThKNXaex:Va.A 
Le. 
Nar.ex —y 
and 
T 
Tke:Ar.A ew Carew ae 


(dn) 


TF unpair(e,zyH y):A G 


TF (a, unpair(e, zy > y)) : da.A 


n-reduction rules correspond to 


e€ 


~~ 


Tre: dz. 
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1.e. 


x, unpair(e, ry y — e€ 
n 


5.2 Theories 


A first-order theory © on a given signature and set of predicates is a (possibly 
infinite) set of closed formulas called arioms. A formula A is provable in a 
theory © if there is a finite subset [ C © such that [+ A is provable. Unless 
otherwise specified, the ambient first order logic is usually taken to be classical 
when considering first order theories. 


5.2.1 Equality. We often consider theories with equality. This means that we 
suppose that we have a predicate “=” of arity 2, together with axioms 


Va.u =x 


VaVy.2=y>y=e 


VaNyVz2c=y>ys=eto>uqz 
and, for every function symbol f of arity n, we have an axiom 


Vili. VGy sina Vlg NOEs 


=a, jl = ee Ha iis Se ST keg) 


and, for every predicate P of arity n, we have an axiom 


VaiNG, viso VGqVE, 


Lp, ee Oy Se Piss ean) Peis) 


These are sometimes called the congruence axioms. 
Example 5.2.1.1. The theory of groups is the theory with equality over the 
signature © = {x : 2,1: 0} whose axioms are 


Va.lxav=2 VaNyVz(“ xX y) xX z= 2x (yx 2) Va. 
Va.“axl=2 Va. 


yyxxr= 1 
ycxy=l 


together with the axioms for equality 


Vi.n=2 


VaVyc=y>y=2 


VaNyNz.c=y>y=et>u=z 


Vea NyWy 2=2' >y=y >aexy=a' xy! 
1=1 


5.2.2 Properties of theories. A theory is 
— consistent when is not provable in the theory, 


— complete when for every formula A, either A or A is provable in the 
theory, 


— decidable when there is an algorithm which, given a formula A decides 
whether A is provable in the theory or not. 
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5.2.3 Models. Theories are thought of as describing structures made of sets 
and functions satisfying axioms. For instance, the theory of groups of exam- 
ple 5.2.1.1 can be seen as a syntax for groups in the traditional sense. These 
structures are called models of the theory and we very briefly recall those here. 
We do not even scratch the surface of model theory here, and the reader inter- 
ested in knowing more about those is urged to read some standard textbooks 
about that such as [CK90]. 


Structure. Suppose given a signature © and a set P of predicates. A structure M 
consists of 

— a non-empty set M called the domain of the structure, 

— a function [f] : @" > M for every function symbol f € &, 

— arelation [P] C M” for every predicate symbol P € P. 


Interpretation. Suppose fixed such a structure. Given k € N and a term t whose 
free variables are among {21,...,2;}, we define its interpretation as the function 


[i]*: MF > M 
defined by induction by 
[z]*: MF > M 


is the canonical 7-th projection and, for every function symbol f of arity n, and 
(mi,...,mz) € M*, 


[FG tal Gaye = 
[fl (ta) * (mi, ..., me), ---5 [en] ®(ma,..., me) 


where [f] is given by the structure and [t;]” is computed inductively for every 
index 7. In other words, the interpretation of terms is the only extension of the 
structure which is compatible with composition. Given k € N and a formula A 
whose free variables are among {x1,...,2,}, we define its interpretation [A]* 
as the subset of M* defined inductively as follows: 


[.y* =6 [T]F = mF 
[AA B]* = [4] 0 [BI [Av B]* = [4]* vv [By 
[-A]* = M* \ [4}* [A= B]* = [-Av B]* 


together with 


[Vrne1-A]* = () {(mi,...,mx) € M* | (ma,...,mx,m) € [A]**} 
meM 


and 


= 


Arp41-A]* = U {(mi,..-, mx) € M® | (m,...,ms,m) € [A]**} 
meM 


The interpretation of A is thus intuitively the set of values in M for its free 
variables making it true. 
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Satisfaction for closed formulas. Given a closed formula A, its interpretation [.A]° 
is a subset of M° = {()}, which is a set with one element, conventionally written 
(). There are therefore two possible values for [A]°: @ and {()}. In the second 
case, we say that the formula A is satisfied in the structure. 


Model. A structure is a model of a theory © when each formula in O is satisfied 
in the structure. 


Example 5.2.3.1. Consider the theory of groups (example 5.2.1.1). A structure 
consists of 


—aset M, 


a function [x]: M? > M, 


| 


a constant [1]: M° > M, 
— a relation [=] C Mx M. 


We say that such a structure has strict equality when the interpretation of the 
equality is the diagonal relation 


[=] = {(m,m) | me M} 


Such a structure M is a model of the theory of groups, i.e. is a model for all 
its axioms, precisely if (W,[],[1]) is a group in the traditional sense, and 
conversely every group gives rise to a model where equality is interpreted in 
such a way: the models with strict equality of the theory of groups are precisely 
groups. 


Remark 5.2.3.2. As can be seen in the previous example, it is often useful to 
restrict to models with strict equality. Since equality is always a congruence 
(because of the axioms imposed in section 5.2.1), from any model we can con- 
struct a model with strict equality by quotienting the model under the relation 
interpreting equality, so that this assumption is not very restrictive. 


Validity. A sequent 
yt: Ai,---,Yn:Anb A 


is satisfied in M if for every k € N such that the free variables of the sequent 
are in {x1,...,2,}, we have 


[Ae n...n[Aa}* ¢ [4] 


which is equivalent to requiring that [-(A1 A... A An = A)]* is empty. A 
sequent is valid when it is satisfied in every model. 


Correctness. We can now formally state the fact that our notion of semantics is 
compatible with our logical system. 


Theorem 5.2.3.3 (Correctness). Every derivable sequent is valid. 


Proof. By induction on the derivation of the sequent. 


The above theorem has the following important particular case: 


CHAPTER 5. FIRST-ORDER LOGIC 236 


Corollary 5.2.3.4. For every theory © and closed formula A such that OF A is 
derivable, every model of © is also a model of A. 


Example 5.2.3.5. In the theory of groups, one can show 


Va Nyy axy=loy xr=loey=y 
by formalizing the following sequence of implications of equalities: 


xrxy=l 
y x(a@xyy=y' x1 
y x(exy)=y 
(Gy xa) xy=y! 
lxy=y' 
y=y! 
By correctness, it holds in every group: a left inverse of an element coincides 
with any right inverse of the same element. 
The contrapositive of the above theorem is also quite useful: 


Corollary 5.2.3.6. For every theory © and closed formula A, if there exists a 
model of © which is not a model of A then OF A is not derivable. 


Example 5.2.3.7. In the theory of groups, consider the formula 
VaVy.uxy=yXu 


We know that there exist non-abelian groups, for instance the symmetric group 
on 3 elements S3. Such a non-abelian group being a model for the theory of 
groups but not for the above formula, we can conclude that this formula cannot 
be deduced in the theory of groups. 


Finally, a major consequence of the theorem is the following. A theory is satis- 
fiable when it admits a model. 


Proposition 5.2.3.8. A satisfiable theory is consistent. 
Proof. Suppose that © is a theory with a model M. If we had OF L then, by 


corollary 5.2.3.4, we would have that M is a model of L, which it is not since 
[-L] = @ by definition. 


Remark 5.2.3.9. As explained in section 5.1.10, the handling of first-order vari- 
ables in traditional first-order logic is not entirely satisfactory: we do not keep 
track of the free variables we use. This is why we have to have k (the number of 
first-order variables) as a parameter for the interpretation. This is also why we 
need to restrict to non-empty domains in structures. For instance, the sequent 
- da.T is always derivable and it would not be satisfied in the structure with 
an empty domain. See section 5.1.10 for a solution to this issue. 


Skolemisation. Suppose fixed a signature © and a set P of predicates. A formula 
on (XP) of the form 


Va.dy.A(x, y) 
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states that for every element x there exists a y such that A(z, y) is satisfied. 
When this formula admits a model, we can construct a function f which to 
every x associates one of the associated y. Thus it implies that the formula 


Vau.A(a, f(x)) 


on (’,P) is also satisfiable, where the signature ©’ is © extended with a sym- 
bol f of arity one. By a similar reasoning, one shows that the satisfiability of 
the second formula implies the satisfiability of the first formula. The two for- 
mulas are thus equisatisfiable: one is satisfiable if and only if the second is. This 
process of “replacing existential quantifications by function symbols” is due to 
Skolem: it allows to replace a theory by another equisatisfiable theory whose 
axioms do not contain existential quantifications, see section 5.4.6. 
More generally, given a formula on (“,P) of the form 


Vaz....Vin.dy.A 
a skolemization of it is the formula 


Vax1... Vin Al f(y, tee ¥m)/y] 


on the signature (’,P) where FV(Sy.A) = {y1,.--,Ym} and b’ is © extended 
with a fresh symbol f of arity m. 


Proposition 5.2.3.10. A formula of the form V2,....Va,.dy.A is satisfiable if 
and only if its skolemization is. 


Example 5.2.3.11. In the theory of groups from example 5.2.1.1, we can skolem- 
ize the axiom Vx.dy.y x x = 1. This forces us to introduce a new unary function 
symbol 7 (which will be the function that to an element associates its inverse) 
and reformulate the axiom as Vx.i(x) x «= 1. 


Remark 5.2.3.12. If we allowed to perform this process for any existential quan- 
tification in a formula, proposition 5.2.3.10 would not be true. For instance, the 
formula 7(S2.g(x) = x) is satisfiable when g has no fixpoint. If we “skolemize” 
it, we obtain the formula =(g(f()) = f()) where f is a fresh nullary function 
symbol: this formula is satisfiable when there is an element which is not a 
fixpoint for g. The two are thus not equisatisfiable. 


5.2.4 Presburger arithmetic. The Presburger arithmetic axiomatizes the ad- 
dition over natural numbers. It is the theory with equality over the signature 
© = {0:0,S:1,+: 2} whose axioms are those for equality together with 
Va.0 = S(x) > 1 
VaVy.S(x) = S(y)>a4=y 
Vi0+xr2=2 
Va.Vy.S(2) +y = S(@+y) 


together with, for every formula A(x) with one free variable x, an axiom 
A(0) > (Vx.A(a) > A(S(x))) > Var.A(a) 


such an infinite family of axioms is sometimes called an axiom scheme: it ex- 
presses here the induction principle. 
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Example 5.2.4.1. For instance, Vz.c + 0 = x can be proved by induction on z. 
Namely, consider the formula A(x) being « + 0 = x. We have 


— A(0): 0+0=0. 
— Suppose A(a), we have A(S$(x)), namely S(x) +0 = S(x +0) = S(a). 


This theory was shown by Presburger to be consistent, complete and decid- 
able [Pre29]. In the worst case, any decision algorithm has a complexity O(2?"” ) 
with respect to the size n of the formula to decide [FR98], although it is useful 
in practice (it is for example implemented in the tactic omega of Coq). It is also 
very weak: for instance, one cannot define the multiplication function in it (if 
we could, it would not be decidable, see next section). 


5.2.5 Peano and Heyting arithmetic. The Peano arithmetic, often written 
PA, extends Presburger arithmetic by also axiomatizing multiplication. It is 
the theory with equality on the signature © = {0:0,S:1,+:2,x : 2} whose 
axioms are those of equality, those of Presburger arithmetic, and 


Ve.0 x «=0 
VaVy.S(2) x y=yt (ax y) 


This theory is implicitly understood with an ambient classical first order logic. 
When the logic is intuitionistic, the theory is called Heyting arithmetic (or HA). 


Exercise 5.2.5.1. In HA, prove V2.2 + 0= a. 


Consistency. The second of Hilbert’s list of 23 problems posed in 1900 consisted 
in showing that Peano arithmetic is consistent, i.e. cannot be used to prove L, or 
equivalently that 0 = (0) cannot be proved. A natural reaction would be to use 
corollary 5.2.3.4 and build a model for this theory, whose existence would imply 
its consistency, and there is an obvious model: the set N of natural numbers with 
usual zero, successor, addition and multiplication functions. However, the usual 
construction of this set of natural numbers is itself performed inside (models 
of) set theory (see section 5.3) which is a much stronger theory. All this would 
prove is that if set theory is consistent then Peano arithmetic is consistent, 
which is like proving that if we have a nuclear bomb then we can kill a fly. 
This is why people first hoped to prove the consistency of Peano arithmetic in 
theories as weak as possible and, why not, in Peano arithmetic itself. However, in 
1931, Gédel showed in his second incompleteness theorem that Peano arithmetic 
cannot prove its own consistency [G6d31] (unless it is inconsistent). The cut 
elimination procedure was then introduced by Gentzen in 1936 precisely in order 
to show consistency of Heyting arithmetic using methods similar to those of 
theorem 5.1.9.2, although the proof is more involved due to the presence of 
the axioms of the theory, from which one can deduce the consistency of Peano 
arithmetic using double negation translations of Peano arithmetic into Heyting 
arithmetic as in section 2.5.9, see [Gen36]. 


Induction up to €9. Gentzen’s proof brings no contradiction with Gédel’s the- 
orem, because this proof (or more precisely the proof that the cut-elimination 
procedure terminates) requires more than the induction principle: we need a 
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(*x Finite rooted trees. *) 
type tree = T of tree list 


(*x Lexicographic extension of an order. *) 
let rec lex le 11 12 = 
match 11, 12 with 
| x::11, y::12 -> 
if x = y then lex le 11 12 
else le x y 
| £1, _ -> true 
| _, [J -> false 


(** Order on trees. *) 
let rec le t1 t2 = 
match t1, t2 with 
ei ee ga oe 
let cmp t1 t2 = 
if tl = t2 then @ 
else if le t1 t2 then -1 else 1 
in 
let 11 = List.sort cmp 11 in 
let 12 = List.sort cmp 12 in 
lex le 11 12 


Figure 5.1: €9 in OCaml. 


transfinite induction up to the ordinal ¢9 (which is w to the power w to the 
power w and so on, i.e. €9 = w*°). In other words, while induction only requires 
us to believe that the set of natural numbers is well-founded, the transfinite in- 
duction up to €9 now requires that the following set of trees is well-founded. By 
a classical result in ordinal arithmetic (which we cannot detail here), any ordinal 
a < €9 can be uniquely written as a = w9!+...+w wherea > 6, >... > Bn 
are ordinals, this is called the Cantor normal form of a, each of the 6; having 
a similar normal form. Such an ordinal a can thus be represented as a planar 
rooted tree, with one root and n sons, which are the trees corresponding to the 
6;. For instance, the ordinals we t1 49 and w’ -3+42 respectively correspond 
to the trees 
Sl oA . SY oe NZ 
Ze a | | | 
. Yy . . . . . . . 
i Tere oe 
2 : 

These trees can be compared by lexicographically comparing the sons of the 
root (which are supposed to be ordered decreasingly), so that for instance, the 


tree on the left above is greater than the one on the right. An implementation 
of this order is provided in figure 5.1. This order can also be interpreted using 
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the following Hydra game on trees [KP82]. This game with two players starts 
with a tree as above and at each turn 


— the first player removes a leaf x (a node without sons) of the tree, 


— the second player chooses a number n, looks for the parent y of x and the 
parent z of y (it does nothing if no such parents exist), and adds n copies 
of the tree with y as root as new children of z. 


The game stops when the tree is reduced to its root. We now see where the 
game draws its name from: the first player cuts the head of the Hydra, but in 
response the Hydra grows many new heads! For instance, in the figure above, 
the tree on the right is obtained from the one of the left after one round. Given 
trees a and (, it can be shown that a > @ if and only if @ can be obtained after 
some finite number rounds of the game starting from a. Believing that €o is 
well-founded is thus equivalent to believing that every such game will necessarily 
end (try it, to convince yourself that it always does!). 


Undecidability. Finally, we would like to mention that Peano arithmetic is also 
undecidable, which was shown by Turing [Tur37]. Namely, a sequence of con- 
figurations of a Turing machine can be suitably encoded as an integer, so that 
one can write a formula expressing that a given natural number encodes such 
a sequence, which ends on an accepting configuration. From there, it is easy to 
construct a formula expressing the fact that the machine is not halting, and such 
formulas cannot be decided, otherwise we would decide the halting problem. 


5.3 Set theory 


Set theory is a first-order theory whose intended models are sets. Everything is 
a set there, in particular the elements of sets are themselves sets. This theory 
was defined at the beginning of the 20th century while looking for axiomatic 
foundations of mathematics. We only briefly scratch the subject and refer to 
standard textbooks [Kri98, Deh17] for more details. 


5.3.1 Naive set theory. The naive set theory is the theory with a binary pred- 
icate “E” and the following axiom scheme, called unrestricted comprehension 


dy.Va.ceye@aA 


for every formula A with z as only free variable. Informally, this states for every 
property A(z), the existence of a set 


y = {2 | A(z)} 


of elements x satisfying the property A(x). This theory is surprisingly simple 
and works surprisingly well: we can perform all the usual constructions: 


— the empty set is {a | L}, 
— the union of two sets is r Uy = {z| zErVze y}, 


— the intersection of two sets is rN y= {z|zEuAzeé y}, 
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— the product of two sets is « x y = {(4,7) |i € AZ € y} with the notation 
(3) = (th {oath 

— the inclusion of two sets isa Cy=Vz.zEur>ze€y, 

— the powerset is P(x) = {y | y C a}, 


and so on. 


Russell’s paradox. There is only a “slight” problem with this theory: it is in- 
consistent, meaning that we can in fact prove any formula, which explains why 
everything was so simple. This was first formalized by Russell in 1901, using 
what is known nowadays as the Russell paradox, which goes as follows. Consider 
the property 

A=-(2€2) 


The unrestricted comprehension scheme ensures the existence of a set y such 
that 
Va.ueyS-(r ex) 


In particular, for x being y, we have 


yeys vey) 


In classical logic, we can easily conclude to an inconsistency: 


— ify € y then -(y € y) and therefore we can prove L 


— if a(y € y) then y € y and therefore we can prove L. 


Russell’s paradox in intuitionistic logic. This proof can be thought of as rea- 
soning by case analysis on whether y € y is true or not and, as such it seems 
that it is not intuitionistically valid because we are using the excluded middle. 
However, it can also be considered as a valid intuitionistic proof: namely, the 
two cases amount to 


— prove =(y € y), and 


— prove —-(y € y), 


from which we can deduce L. More generally, for any formula A, one can show 
intuitionistically that 
(ASrAA)Sl 


Namely, the equivalent formula 


can be proved by 


Tadpadoa™) = Baars ™) Pardasca™ para 
Tapa T,-AFA (+8) T,AF—A (>) Para & 
Pdr L Ce) T,ArL (==) 
TRA (1) TRA (9) 
TRL Ce) 


(=1) 
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with [=A 1A, A = A, whose corresponding proof-term is 


fA 4 dg tO Oa A 2 (gan) Oa" Ff aa) 
Another, more symmetrical proof term for the same formula is 
AFASWA AGQ”A"*. f(g (Aa*.f aa) (g(Aa”. faa) 


In both cases, note that we recover the looping term 2 = (Av.ax)(Av.xax) if we 
apply it to the identity twice. 
The so-called Curry paradox is the following slight generalization of the 
above formula 
(As (A= B))=B 


and can be shown using the same A-terms. 


Size issues. The problem with naive set theory is due to size: the collection of all 
sets is “too big” to actually form a set. Once this issue was identified, subsequent 
attempts at formalizing set theory have struggled to take it in account. We 
should not be able to consider this as a set and therefore we cannot consider 
the set of all sets which satisfy a property, such as not belonging to itself... 


Other paradoxes. Other paradoxes can be used to show the inconsistency of 
naive set theory. For instance, an argument based on Cantor’s theorem is the 
following one. Suppose that there exists a set u of all sets. Every subset x of u is 
a set, and thus an element of u. In this way, we can construct an injection from 
the powerset of u to u, which is excluded by Cantor’s diagonal argument, see 
appendix A.4. Another classical paradox is the one of Burali-Forti, presented 
in section 8.2.3. 


5.3.2 Zermelo-Fraenkel set theory. The above observations lead to a re- 
fined axiomatic for set theory, the most popular being called Zermelo-Fraenkel 
set theory, or ZF [Zer08]. We make a very brief presentation of it here, mostly 
discussing the axiom of choice. This is the classical first order theory with 
equality with a binary predicate €, whose axioms are the following. 


Axiom of extensionality. This axiom states that two sets with the same elements 
are equal: 
VaVy.((Vz.2€rezey)>r=y) 


If we introduce the notation « C y for the formula Vz.z € « => z € y which 
expresses that the set x is included in the set y, the axiom of extensionality can 
be rephrased as 


VaVy(a CyAyCu)>ur=y 


i.e. two sets are equal precisely when they have the same elements. 
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Aziom of union. This axiom states that the union of the elements of a set is 
still a set: 


Va.dy.Vi.i ey S dz(ie€ zAz€a)) 
In more usual notation, this states the existence, for every set x, of the set 
ZEL 


In particular, we can construct the union of two sets x and y as 


xcUy=|J{2, y} 


where the set {x,y} is constructed using the axiom schema of replacement, see 
below. 


Aziom of powerset. This axiom states that given a set x, there is a set whose 
elements are precisely the subsets of x, usually called the powerset of x and 
written P(x): 


Va.dy.V2(z ey & (Vite zS>i€ 2)) 


In more usual notation, 


Ve.dyVz.(z2€y@zCa) 


i.e. we can construct the set 


y =P(x) ={z|2C x} 


Axiom of infinity. The axiom of infinity states the existence of an infinite set: 


de.(0@earAVy.y € x => S(y) € 2) 


where the empty set @ is defined using the axiom schema of replacement below 
and S(y) = y U {y} is the successor of a set. A set is called inductive when 
it contains the empty set and is closed under successor: the axiom states the 
existence of an inductive set. In particular, the set N of natural numbers can be 
constructed as the intersection of all inductive sets. Here, the natural numbers 
are encoded following the von Neumann convention: 


0=0 1=0U {0} = {0} 2=1U {1} = {0, {0}} 


and more generally n +1 = nU {n}. The definition implies immediately the 
following principle of induction: every inductive subset of the natural numbers 
is the set of natural numbers. 


Axiom schema of replacement. This axiom states that the image of a set under 
a partial function is a set: 


(ViVi Vi (AA Aly’ /j] > 9 = 9')) > Vey VIG € y & Fite a A)) 


where A is any formula such that j’ ¢ FV(A) (but A might contain i or j or 
other free variables). This is thus an axiom schema: it is an infinite family of 
axioms, one for each such formula A. 
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For simplicity, we consider the case where the formula contains only 2 and j 
as free variables, and is thus written A(i,7). In this case the axiom reads as 


(ViVi V7 (AG, 7) A ACG, H) > 7 =I) => 
Va.dy.Vj.j € y & di. € «A A(i,7))) 


The formula A encodes a relation: a set i is in relation with a set 7 when A(i, 7) 
is true. In particular, the relation corresponds to a partial function when every 
element 7 is in relation with at most one element j, i.e. 


ViN V7 (A(t, J) NAGI) 7 = 7) 


Namely, such a relation corresponds to the partial function f from sets to sets 
such that f(z) is the unique j such that A(i,7), should there exists one, and is 
undefined otherwise. Then, our axiom states that given a set x, we can construct 
the set 


y= {7 | tie w.A(i, 7)} 
of its images under /f. 
For instance, the empty set @ is defined as the set y obtained in this way 
from any set x (and there exists one by the axiom of infinity) using the nowhere 
defined function, which can be encoded as the relation A = _L: 


P={j7 | Hie aL} 


Given two sets x and y, we can construct the set {x,y} as the image of the 
partial function over the natural numbers sending 0 (i.e. 0) to x and 1 (i.e. {0}) 
to y: 


{x,y} = {j | HEN(G=OAj=2)V(E=1AZ=y)} 
and we can similarly construct a set containing any finite given family of sets. 
Given two sets x and y, we can construct their intersection as 


eny={j|qierUyjg=iNiexrniey} 


Given two sets x and y, we can encode a pair of elements 7, € x and ig € y as 
(i1, 22) = {{t1}, {41, i2}}, which is an element of P(P(«Uy)), and thus construct 
the product of the two sets as 


GTXy= {j | tie P(P(a U y)).st1.Fi2.j =tAi= (41, t2) Aye xAi2 € y} 


More generally, given a predicate B(i) and a set x, we can we can construct the 
set of elements of x satisfying B as 


{ier | BU)}={7 |t=FA BU)} 
This construction corresponds to what is sometimes called the axiom schema of 
restricted comprehension and can formally be stated as 


Va.dy.Vi.i ey (ie aA A)) 
where A is a formula such that z,y ¢ FV(A), but A might contain 7 or other 
free variables, i.e. in usual notation, for every set x we can construct the set 


y={ie2| A} 


Note that compared to the unrestricted comprehension scheme, which was at the 
source of Russell’s paradox, we can only construct the set of elements of some set 
which satisfy A. 
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Axiom of foundation. The axiom of foundation states that every non-empty set 
contains a member which is disjoint from the whole set: 


Vu.(dy.y € x) > dy.(y Ex A 7di(i e yAtE z)) 


or, in modern notation, 


Va.c#é0>iyenynr=90 


One of the main consequences of the axiom of foundation is the following: 


Lemma 5.3.2.1. There is no infinite sequence of sets (2;) such that x;41 € 2%. 


Proof. Suppose the contrary. The sequence of sets can be seen as a function f 
with N as domain, which to every i associates f(i) = x;. By the axiom schema 
of replacement, its image « = {x; | i € N} is also a set and by the axiom of 
foundation there exists y € x such that yNx 4 J. By definition of x, there exists 
some natural number 7 for which y = f(i) = «;. However, we have x41 € 2; 
and therefore 7341 € yN x. Contradiction. 


In particular, there is not set x such that x € x (otherwise, the constant sequence 
x; = x would contradict previous lemma). 

The axiom of foundation is, in presence of the other axioms, equivalent to 
the following, better looking, axiom of induction, which is a variant of transfinite 
induction (sometimes called €-induction): 


(Va.(Vy.y € « => A(y)) > A(x)) > Va.A(a) 


for every predicate A with FV(A) C {a}. 


Avoiding Russell’s paradox. Intuitively, the way ZF avoids Russell’s paradox is 
by considering that the collections such as the collection of all sets are “too 
big” to be sets themselves: they are sometimes called classes and a set can be 
considered as a “small class”. 

For instance, we have the restricted schema of replacement, but not the 
unrestricted one, which would allow defining the set of all sets as {x | T}: we 
can only consider the collection of elements satisfying some property within a 
set, i.e. a subset of a small class is itself small. This is also the reason why, in the 
axiom scheme of replacement, we require A(x,y) to be functional: otherwise, 
for a given x the set {y | A(x, y)} would not guaranteed to be a set (it could be 
too big). Also, we have seen that the axioms of foundation ensures that no set 
contains itself, which avoids that classes, such as the collection of all sets, are 
themselves sets. 


The axiom of choice. The axiom of choice states that given a collection x of 
non-empty sets, we can pick an element in each of the sets: 


Va. ¢x => Af: 2 —+(J2).vy € «-f(y) ey 


This states that given a set x of non-empty sets (i.e. x does not contain @), 
there exists a function f from x to Ux (the union of the elements of 2, which 
can be constructed with the axiom of union) which, for every set y € x picks an 
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element of y (ie. f(y) € y): this is called a choice function for x. The careful 
reader will notice that the existence of a function is not a formal statement of 
our language but it can be encoded: the formula i(f : « > y).A asserting the 
existence of a function f from x to y such that A, is a notation for a formula of 
the form 


df.fCaxyr... 
which would state (details left to the reader) the existence of a subset f of x x y 
which, as a relation, encodes a total function such that A is satisfied. 

The axiom of choice has a number of classically equivalent formulations 
among which 


— every relation defined everywhere contains a function, 
— every surjective function admits a section, 

— the product of a family of non-empty sets is non-empty, 
— every set can be well-ordered, 


and so on. 


5.3.3 Intuitionistic set theory. Set theory, as any other theory can also 
be considered within intuitionistic first order logic, in which case it is called 
IZF. The reason for is the usual one: we want to be able to exhibit explicit 
witnesses when constructing elements of sets. We will however see that there is 
a price to pay for this, which is that things behave much differently than usual: 
intuitionism is not necessarily intuitive, see [Baul7] for a very good general 
introduction to the subject. 


Equivalent formulations of excluded middle. Most of the proofs related to the 
excluded middle in set theory are using the following simple observation. Given 
a proposition A, which might contain any free variable except y, consider the 
set 

t={yeN | A} 


Then, given a natural number y € N, we have y € zx if and only if A holds: 
(yYEr)eA (5.1) 


(in practice, we often use y = 0 as arbitrary natural number). In particular, 
when A = 1, the set zx is (by definition) the empty set @ and we have y € @ 
if and only if L. By the axiom of extensionality, we thus have that x = 0) is 
equivalent to Vy € N.(y € x) © (y € ®) which is equivalent to A <= L, which is 
equivalent to A = L (since L = A always holds): we have shown 


(c=0)e7A (5.2) 


For instance, a typical thing we cannot do in IZF is test whether an element 
belongs to a given set or not: 


Lemma 5.3.3.1. In IZF, the formula 


Vy. Va(yex)Vv (y €2) 


is satisfied if and only if the law of excluded middle is. 
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Proof. The right-to-left implication is obvious. For the left-to-right implica- 
tion, given any formula A, consider the natural number y = 0 and the set 
x ={y€N | A}: the set x contains the element 0 if and only if A holds. We 
conclude using (5.1): the above proposition would imply 


(0c {x EN] A}) V(0¢ {x EN | A}) 


which is equivalent to 
AVAA 


and we conclude. 


The intuition behind this result is the following one. In a constructive world, 
an element of x = {y € N | A} consists of as an element of N together with a 
proof that A holds. Therefore, in order to decide whether 0 belongs to x or not, 
we have to decide whether A holds or not. 

Considering the variant of the excluded middle recalled in lemma 2.3.5.3, 
similarly, we cannot test a set for emptiness either: 


Lemma 5.3.3.2. In IZF, the formula 
Va.(x = 0) V (a 0) 
is satisfied if and only if we can prove 
AAVAAA 


for every formula A. 


Proof. The right-to-left implication is clear. For the left-to-right implication, 
given a formula A, consider the set x = {y € N | A}. We conclude using (5.2): 
we have x = 0 if and only if0 € {ye N| A} SO0€ {y EN | 1}, if and only if 
As, if and only if 4A. 


More generally, we do not expect to be able to decide equality either: the formula 


VaVy.(x = y) V(x #y) 


would imply that we can test for emptiness as a particular case. Of course, this 
does not imply that we cannot decide the equality of some particular sets. For 
instance, one can show that 0 = 0 4 {0} =1 (because () belongs to 1 but not 
to 0) and therefore, writing 


B={0,1}={t#@eEN|x=0Va2=1} 


for the set of booleans, we can decide the equality of booleans. By a similar 
reasoning, we can decide the equality of natural numbers. 

Many other “unexpected” properties of IZF (compared to the classical case) 
can be proved along similar lines. For instance, the finiteness of subsets of a 
finite set is equivalent to being classical. By a finite set, we mean here a set x 
for which there is a natural number n and a bijection f : {0,...,2—1} > a. 


Lemma 5.3.3.3. In IZF, every subset of a finite set is finite if and only if the law 
of excluded middle is satisfied. 
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Proof. Suppose that every finite subset of a finite set is finite. Given a prop- 
erty A, consider the set x = {y € B| A}, which is a subset of the finite set B of 
booleans. By hypothesis, this set is finite and therefore there exists a natural 
number n and a function f as above. Since we can decide equality for natural 
numbers as argued above, we have either n = 0 or n ¥ 0: in the first case x = 0 
and thus =A holds, in the second case, f(0) € # and thus A holds. We therefore 
have AV =A. Conversely, in classical logic, every subset of a finite set is finite, 
as everybody knows. 


The axiom of choice. Seen from a constructive perspective the axiom of choice 
is quite dubious: it allows the construction of an element in each set of a family 
of non-empty sets, without having to provide any hint at how such an element 
could be constructed. In particular, given a non-empty set x, the axiom of 
choice provides a function f : {x} > a, ie. an element of x (the image of x 
under f), and allows proving 


c#é0S> ryyer 


i.e. we can construct an element in x by only knowing that there exists one. 
This is precisely the kind of behavior we invoked in section 2.5.2 in order to 
motivate the fact that double negation elimination was not constructive. In 
fact, we will see below that having the axiom of choice implies that the ambient 
logic is classical. 

Another reason why the axiom of choice can be questioned is that it al- 
lows proving quite counter-intuitive results, the most famous perhaps being the 
Banach-Tarski theorem recalled below. Two sets A and B of points in R? are 
congruent if one can be obtained from the other by an isometry, i.e. by using 
translations, rotations and reflections. 


Theorem 5.3.3.4 (Banach-Tarski). Given two bounded subsets of R® of non- 
empty interior, there are partitions 


A=A,W...WA, B=B,wW...WB, 


such that A; is congruent to B; for 1 <i<n. 


Proof. Using the axiom of choice and other ingredients... 


In particular, consider the case where A is a ball in R? and B is two copies of 
the ball A. The theorem states that there is a way to partition the ball A and 
move the subsets of the partition using isometries only, in order to make two 
balls. If you try this at home, you should convince yourself that there is no easy 
way to do so. 

For such reasons, people started to investigate the status of the axiom of 
choice with respect to ZF. In 1938, Gédel constructed a model of ZFC (i.e. a 
model of ZF satisfying the axiom of choice) inside an arbitrary model of ZF, 
thus showing that ZFC is consistent if ZF is [G6d38]. In 1963, Cohen showed 
that the situation is similar with the negation of the axiom of choice. The 
axiom of choice is thus independent of ZF: neither this axiom nor its negation 
is a consequence of the axioms of ZF and one can add it or its negation without 
affecting consistency. 

Constructivists however will reject the axiom of choice, because it implies 
the excluded middle: 
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Theorem 5.3.3.5. In IZF with the axiom of choice, the law of elimination of 
double negation holds. 


Proof. Fix a formula A and suppose ——A holds. The set 
xc={yeEN| A} 


is not empty. Namely, we have seen in (5.2) that x = @ implies —A which, 
together with the hypothesis ——A, implies L. By the axiom of choice, the fact 
that x 4 — implies the existence of an element of x because we have a choice 
function for {a}, which implies A by (5.1). Therefore —~A => A. 


The above proof is not entirely satisfactory because it uses the following form 
of the axiom of choice: 


any set y, whose elements x are not empty, admits a choice function. 


The issue here is that we suppose that an element x of y is not empty, i.e. it is 
not the case that x does not contain an element. From a constructive point of 
view, this is not equivalent to supposing that it contains an element (not not 
containing an element does not mean that we contain an element, because we 
do not admit double negation elimination) and the latter is more constructive. 
A better formulation of the axiom of choice would thus be 


any set y, whose elements x contain an element, admits a choice 
function. 


This hints at the fact that we should be careful in IZF about what we mean by 
the axiom of choice: formulations which were equivalent in classical logic are not 
any more in intuitionistic logic. This second formulation of the axiom of choice 
still implies the excluded middle as first noticed by Diaconescu [Dia75, GM78], 
but this is much more subtle: 


Theorem 5.3.3.6 (Diaconescu). In IZF with the axiom of choice, the law of 
excluded middle is necessarily satisfied. 


Proof. Fix an arbitrary formula A: we are going to show ~AV A. Consider the 
sets 


c= {ze€B|(z=0)vV A} and y={z€B| (z¢=1)VA} 


Those sets are not empty since 0 € x and 1 € y. By the axiom of choice, there 
is therefore a function f : {x,y} — B such that f(a) € w and f(y) € y. Now, 
f(x) and f(y) are booleans, where equality is decidable, so that we can reason 
by case analysis on those. 


— If f(x) = f(y) = 0 then 0 € y thus (0 = 1) V A holds, thus A holds. 


— If f(x) = f(y) =1 then 1 € x thus (1 = 0) V A holds, thus A holds. 


— If f(x) =041= f(y) then x F y (otherwise, f(x) = f(y) would hold), 
and we have =A: namely, supposing A, we have x = y = A, and thus L 
since x F y, 

— If f(z) = 140 = f(y) then we can show both A and —A as above (so 
that this case cannot happen). 
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Therefore, we have =A V A. 


This motivates, for the reader interested in intuitionistic logic, which we 
hope you are by now, the exploration of set theory without choice, but you 
should be warned that this theory behaves much differently than usual. For 
instance, Blass has shown the following result [Bla84]: 


Theorem 5.3.3.7. In ZF, the axiom of choice is equivalent to the fact that every 
vector space has a basis. 


In fact, we know models of ZF where there is a vector space admitting no basis, 
and one admitting two basis of different cardinalities. 


Synthetic differential geometry. Since classical logic is obtained by adding ax- 
ioms (e.g. excluded middle) to intuitionistic logic, a proof in intuitionistic logic 
is valid in classical logic (we are not using the extra axioms). Therefore, one 
is tempted to think that intuitionistic logic is less powerful than classical logic, 
because it can prove less. Well, this is true, but this can also be seen as a 
strength: this also means that we have more models of intuitionistic theories 
than their classical counterparts. We would like to give an illustration of this. 

The notion of infinitesimal is notoriously difficult to define in analysis. In- 
tuitively, such a quantity is so small that it should be “almost 0”; in particular, 
it should be smaller that any usual strictly positive real number. Having such 
a notion is quite useful. For instance, we expect the derivative of a function 
f:R- Rat « be defined as 


f(a) = (f@+e) — f(x))/e 


for any non-zero infinitesimal c«. Namely, f’(x) should be the slope of the line 
tangent to the slope of f at 2, ie. 


f(@+e) = f(x) + fi(ae 


More precisely, by “almost 0”, we mean here that it should capture first-order 
variations, i.e. it should be so small that e? = 0. If we are ready to accept 
the existence of such entities, we find out that computations which traditionally 
involve subtle concepts such as limits, become simple algebraic manipulations. 
For instance, consider the function f(x) = 27. We have 


fli@+e) = (@+e)? = 27? + Wee +e? = 2" + Qe 


and therefore we should have f’(x) = 2, as expected. 
This suggests that we define the set of infinitesimals as 


D={eER| ec? =0} 


and postulate the following principle of microaffineness: 
Aziom 5.3.3.8. Every function f : D —> R is of the form 


fle) =a+be 


for some unique reals a and b. 
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Once this axiom postulated, we necessarily have a = f(0) and we can define 
f'(x) to be the coefficient b. We have already given an example of such a 
computation above. We can similarly, compute the derivative of a product of 
two functions by 


g(x) + f(x)g(2))e + (f'(x) + (a) 
g(x) + f(a)g'(a))Je 


and therefore (f x g)/(x) = f'(x)g(x) + f(x)g'(x) as expected. Similarly, the 
derivative of the composite of two functions can be computed by 


o(f(x + €)) = g( f(a) + f(xe) = of f(a) + 9'(F(a)) fF (whe 
because f’(a)e is easily shown to be an infinitesimal and therefore 
(90 f)'(«) =9'(F(2)) f(a) 


This is wonderful, except that our microaffineness seems to be clearly wrong. 
Namely, 


e*7=0 implies e=0 


thus D = {0}, and therefore any coefficient b would suit. However... the above 
implication uses classical reasoning. Namely: if ¢ 4 0, we have 


e=e*/e=0/e=0 


from which we can conclude that ¢ = 0... in classical logic! In intuitionistic 
logic, all that we have proved is that 


=+(€ = 0) 


This is the sense in which ¢ is infinitesimal: it is not nonzero. 

This shows that there is no obvious contradiction in our axiomatic if we work 
in intuitionistic logic, but it does not prove that there is no contradiction. This 
can however be done by constructing models. The field of synthetic differential 
geometry takes this idea of working in intuitionistic logic in order to define 
infinitesimals as a starting point to study differential geometry [Bel98, Koc06]. 


5.4 Unification 


Suppose fixed a signature. Given two terms ¢t and u, a very natural question 
is: is there a way to substitute their variables in order to make them equal? In 
other words, we are trying to solve the equation 


t=u 


One quickly finds out that there is quite often an infinite number of solutions, 
and we refine the question to: is there a “smallest” way of substituting the 
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variables of t and u in order to make them equal? Occurrences of this problem 
have for instance already been encountered in section 4.4. We explain here how 
to properly formulate the problem, which we have already encountered in sec- 
tion 4.4.2, and exhibit an algorithm in order to solve it. A detailed introduction 
to the subject can be found in [BN99]. 


5.4.1 Equation systems. An equation is a pair of terms (t,u) often written 
tu 


where ¢ and wu are respectively called the left and right member of the equation. 
A substitution o, see section 5.1.3, is a solution of the equation when 


tio] = ufo] 


in which case we also say that o is an unifier of t and u. An equation system, or 
unification problem, E is a finite set of equations. A substitution o is a solution 
(or an unifier) of E when it is a solution of every equation in E. We write E[o] 
for the equation system obtained by applying a substitution o to every member 
of an equation of E: o is thus a solution of E when all the equations of E[o] 
are of the form t +t. 


Example 5.4.1.1. Let us give some examples of unifiers. We suppose that our 
signature comprises two binary function symbols f and g, and two nullary sym- 
bols a and b. 


~ f(x, bQ) + f(a), y) has one unifier: [a()/2x, b()/y), 
(y, z) has many unifiers: [f(y, z)/z], [f(a0, z)/x,a(Q)/y], ete. 
) 


+ g(x,y) has no unifier, 


x 


_a2f 
= F(2; y 
— « = f(x,y) has no unifier. 

Since the solution to an equation system is not unique in general, we can wonder 
whether there is a best one in some sense when there is one. We will see that it 


is indeed the case. 


5.4.2 Most general unifier. A preorder < is a reflexive and transitive rela- 
tion: a partial order is an antisymmetric preorder. We can define a preorder < 
on substitutions by setting o < 7 whenever there exists a substitution o’ such 
that T= 0’ oa. 


Example 5.4.2.1. With 


o = [f(y)/2] o' = [9(a, x)/y] T = [f(g(a, «))/2, g(a, «)/y] 


we have 0/00 =7 and thus o < T. 


A renaming is a substitution which replaces variables by variables (as opposed 
to general terms). The relation < defined above is “almost” a partial order, in 
the sense that it would be so if we considered substitutions up to renaming: 


Lemma 5.4.2.2. Given substitutions 0 and T, we have both o <7 and Tt <a if 
and only if there exists a renaming o’ such that a’ 00 = T. 
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Suppose fixed an equation system F. It easy to see that its set of solutions 
is upward closed: 
Lemma 5.4.2.3. Given substitutions 0 and 7 such that o < 7, if o is a solution 
of F then 7 is also a solution of FE. 


A solution o of E is a most general unifier when it generates all the solutions 
by upward closure, i.e. when 7 is a solution of E if and only if a < 7. We will 
see in next section that when an equation systems admits an unifier, it always 
admits a most general one, and we have an algorithm to efficiently compute it. 
We will thus prove, in a constructive way, the following: 


Theorem 5.4.2.4. An equation system F has a solution if and only if it has a 
most general unifier. 


5.4.3 The unification algorithm. Suppose given an equation system FE, for 
which we are trying to compute a most general unifier. The idea of the algorithm 
is to apply a series of transformations to E, which preserve the set of solutions 
of the system, in order to simplify it and compute a solution. More precisely, 
our goal is put the equation system in the following form: an equation system E 
is in solved form when 


— if is of the form 
E= {x oe A ee ty} 


ie. all its equations have a variable as left member, 


— no variable in a left member of an equation occurs in a right member: 
x; ¢ FV(t,;) for every indices i and J, 
— variables in left members are distinct: x; = x; implies i = j. 
To every equation system in solved form E as above, one can canonically asso- 


ciate the substitution 
OE = [t1/x1, bits, ta Cel 


Lemma 5.4.3.1. Given an equation system in solved form E, the substitution og 
is a most general unifier of E. 


Given an equation system E, the unification algorithm, due to Herbrand [Her30] 
and Robinson [Rob65], applies the transformations of figure 5.2, in an arbitrary 
order, and terminates when no transformation applies. We write 


E ~~ E’ 


to indicate that the transformation replaces E by E’. At some point of the 
execution, the algorithm might fail, which we write 


E ~~ ay 


Example 5.4.3.2. Suppose that the signature comprises symbols a of arity 0, f 
of arity 3, and g and A of arity 1. Consider the equation system 


E={f(a0, 9(), 9(@)) + F(a0, y, 9(A(2)))} 
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Decompose (we propagate equation to subterms): 
{f(ti,..-,tn) = f(u,...,Un)}UE ~~ {ti #u1,...,tn SUnfUE 
Clash (different symbols cannot be unified): for f 4 g, 
{f(ti,---,tn) #9(ui,.--,Um)} “ sh 
Delete (we remove trivial equations): 
{f(ti,...,tn) = f(t,...,tr)}UE ~~ E 
Orient (we want variables as left members): 
{f(ti,...,tn) #c}UE ~ {x # f(ti,...,tr)}UE 
Occurs-check (we eliminate cyclic equations): when « € FV(t,) U... UF V(t,), 
{a # f(ti,...,tn)} ~~ Jf. 
Propagate (we propagate substitutions): when « ¢ FV(t) and x € FV(E), 


{ze 2t}UE ~~ {a =t}U E[t/a] 


Figure 5.2: The unification algorithm. 
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We have 
E ~ {a() #a(), 9(2) + y, g(a) + g(h(z))t by Decompose, 
~~ (g(x) + y, g(x) # g(h(z))} by Delete, 
~ {y + g(x), g(x) + g(h(z))} by Orient, 
~~ {y + g(x), x + h(z)} by Decompose, 
~ {y = g(h(z)), v2 = h(z)} by Propagate. 


The size |t| of a term is the number of function symbols occurring in it: 
n 
|x| = 0 [f(tis--sta)l = 1+ > Teil 
i=1 


Theorem 5.4.3.3. Given any equation system F as input the unification algo- 
rithm always terminates. It fails if and only if E has no solution, otherwise the 
equation system E’ at the end of the execution is in solved form and og is a 
most general unifier of F. 


Proof. This is detailed in [BN99, section 4.6]. Termination can be shown by 
observing that the rules make the size of the equation system EF decrease: here, 
the size is the triple (n1,n2,n3) of natural numbers, ordered lexicographically, 
where m1 is the number of unsolved variables (a variable is solved when it occurs 
exactly once in E, as a left member of an equation), m2 = 04 2 yer |t| + lul 
is the size of the equation system and ng is the number of equations of the 
form t +a in E. The other properties result from the fact that the transforma- 
tions preserve the set of unifiers (_ has no unifier by convention) and that the 
resulting equation system is in solved form. 


Example 5.4.3.4. The most general unifier of example 5.4.3.2 is 


[g(h(z))/y, A(z) /2] 


Remark 5.4.3.5. The side conditions of Propagate are quite important (and often 
forgotten by students when first implementing unification). Without those, 
unification problems such as {x + f(a#)} would lead to an infinite number of 
applications of rules Propagate and Decompose, and thus fail to terminate: 


{x 2 f(2)} ~ (f(x) + fF(@))} ~ {2 + fa). 


The side condition avoids this and the rule Occurs-check makes the unification 
fail: the solution would intuitively be the “infinite term” 


FFFC-)) 


but those are not acceptable here. 


In the worse case, the algorithm is exponential in time and space: hint, consider 
{x1 + f(o, 20), t2 + f(v1,21),---,2n + f(€n-1,En-1)} 


but it performs well in practice. 
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5.4.4 Implementation. Terms can be implemented by the type 


type term = 
| Var of string 
| App of string * term list 


and we can check whether a variable x occurs in a term t, i.e. if x € FV(t), with 


let rec occurs x = function 
| Var y -> x =y 
| App (f, tt) -> List.exists (occurs x) tt 


A substitution can be described as a list of pairs consisting of a variable (here, a 
string) and a term. It can be applied to a term thanks to the following function: 


let rec app s = function 
| Var x -> (try List.assoc x s with Not_found -> Var x) 
| App (f, tt) -> App (f, List.map (app s) tt) 


Unification can finally be performed by the following function, which takes as 
arguments the substitution being constructed (which is initially empty) and the 
equation system (a list of pairs of terms) and returns the most general unifier: 


let rec unify s = function 

| (App (f, tt), App (g, uu))::e -> 
(* clash *) 
if f <> g then raise Not_unifiable 
(* decompose *) 
else unify s ((List.map2 (fun t u -> t, u) tt uu)@e) 

| (App (f, tt), Var x)::e -> 
(* orient *) 
unify s ((Var x, App (f, tt))::e) 

| (Var x, Var y)::e when x = y -> 
(* delete *) 
unify se 

| (Var x, t)::e -> 
(* occurs check *) 
if occurs x t then raise Not_unifiable; 
(* propagate *) 
let t = app s t in 
let e = List.map (fun (t,u) -> app s t, app s u) e in 
let s = List.map (fun (x,t) -> x, app s t) s in 
unify ((x, t)::s) e 

| [] -> s 

let unify = unify [] 


This function raises the exception Not_unifiable when the system has no solu- 
tion. The unifier of example 5.4.3.2 can then be computed with 


let s = 
let t = 
App ("F", L 
App ("a", (1); 
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App ("g", [Var "x"]); 
App ("g", [Var "x"]) 
J) in 
let u = 
App Ch T 
App ("a", (I); 
Var Byres 
App ("g", CApp ("h", [Var "2"])]) 
]) in 
unify [t, ul] 


5.4.5 Efficient implementation. The major source of inefficiency in the pre- 
vious algorithm is due to substitutions. In order to apply a substitution to a 
term, we have to go through the whole term to find variables to substitute, 
and moreover we have to apply a substitution to all the terms in the Propagate 
phase. Instead, if we are willing to use mutable structures, we can use the trick 
we have already encountered in section 4.4.3: we can have variables be refer- 
ences, so that we can modify their contents, and have them point to terms when 
we want to substitute those. This suggests that we should implement terms as 


type term = 

| Var of var ref 

| App of string * term list 
and var = 

| AVar of string 

| Link of term 


A variable x is represented as Var r where r is a reference containing AVar "x 
If, later on, we want to substitute it with a term t, we can then modify the 
contents of r to Link t, which means that the variable has been replaced by t. 
When we do so, the contents of all the occurrences of the variable will thus be 
replaced at once. 

While we could implement things in this way (similarly to section 4.4.3), we 
would like to explain another point and give a variant of this implementation. 
When encoding variables in this way, it is important that all the occurrences 
of the variable x contain the same reference, which is error prone: we have to 
ensure that, for a given variable name, the pointed memory cell is always the 
same. In most applications, the precise name of variables does not matter, since 
we are usually considering terms up to a-conversion. We can thus consider that 
the reference itself is the name of the variable, i.e. the name is the location in 
memory, which avoids the previous possibility for errors. Since two variables 
are now the same when their references are physically the same (i.e. when they 
point to the same memory cell, as opposed to having the same contents) and 
we should thus compare them using physical equality == instead of the usual 
extensional equality =. We can thus rather encode terms as 


type term = 
| Var of term option ref 
| App of string * term list 


A variable will initially be Var r with the reference r containing None (we do not 
use a string to indicate the name of the variable since the name is not relevant, 
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only the position in memory where the None is stored is), and substitution with 
a term ¢ will amount to replacing this value by Some ¢t. We can thus generate a 
fresh variable with 


let var () = Var (ref None) 
and the right notion of equality between terms is given by the following function 


let rec eq t u = 
match t, u with 
| Var {contents = Some t}, u -> eq tu 
| t, Var {contents = Some u} -> eq t u 
| Var x, Var y -> x == y 
| App (f, tt), App (g, uu) -> 
f = g && List.for_all2 eq tt uu 
| _ -> false 


where we use the fact that a reference is implemented in OCaml as a record 
with contents as only field, which is mutable. We can check whether a variable 
occurs in a term with 


let rec occurs x = function 
| Var y -> x == y 
| App (f, tt) -> List.exists (occurs x) tt 


using, as indicated above, physical equality to compare variables, and unification 
can be performed with 


let rec unify t u = 
match t, u with 
| App (f, tt), App (g, uu) -> 
(* clash *) 
if f <> g then raise Not_unifiable 
(* decompose *) 
else List.iter2 unify tt uu 
(* follow links *) 
| Var {contents = Some t}, u -> unify t u 
| t, Var {contents = Some u} -> unify t u 
(* delete *) 
| Var x, Var y when x == y -> () 
| Var x, u -> 
(* occurs check *) 
if occurs x u then raise Not_unifiable 
(* propagate *) 
else x := Some u 
| _, Var _ -> 
(* orient *) 
unify ut 


The unifier of example 5.4.3.2 can then be computed with 


let © = 
let x = var () in 
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let y = var () in 


let z = var () in 
let t = 
App C'f", [ 
App ("a", (1); 
App ("g", [x]); 
App ("g", [Ex]) 
J) in 
let u = 
App ("f", 
App ("a", (1); 
y; 
App ("g", [App ¢("h", [£z])]) 
J) in 
unify t u 


5.4.6 Resolution. A typical use of unification is to generalize the resolution 
technique of section 2.5.8 to first-order classical logic [Rob65]. 


Clausal form. In order to do so, we must first generalize the notion of clausal 
form: 


— a literal L is a predicate applied to terms or its negation 
Ls= P(t,...,tn) | aP(ti,...,tn) 
where P is a predicate of arity n and the ¢; are terms, 
— a clause C is a disjunction of literals, i.e. a formula of the form 


Cr=D,V LeV...V Ip 


We recall that a theory O on a given signature J is a set of closed formulas. 
Any theory can be put in clausal form in the following sense: 


Proposition 5.4.6.1. Given a finite theory 0 on a signature ©, there is a theory ©’ 
on a signature ’ such that all the formulas in ©’ are clauses and the two theories 
© and O’ are equisatisfiable. 


Proof. The process of constructing 0’ from © is done in six steps. 


1. By lemma 5.1.7.4, we can replace any formula in O by an equivalent one 
in prenex form. 


2. By iterated use of proposition 5.2.3.10, we can replace any formula of 
© by an equisatisfiable one without existential quantification on a larger 
signature X’. 


3. By proposition 2.5.5.1, we can replace any formula in ©, which is necessar- 
ily of the form V21....V2,.A where A is an arbitrary formula which does 
not contain any first-order quantification, by an equivalent one where A 
is a conjunction of disjunctions of literals, i.e. of the form 

m ny 


Vrq. oe VXp. \ VV Li j 


i=1j=1 
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4. By repeated use of the equivalence 
Va.(A A B) @ (Va.A) A (Vu.B) 


we can replace every formula of O by a conjunction of universally quanti- 
fied clauses 

m Ni 

\ Vary. o VLR. VV Lig 

i=l j=l 


5. We can then replace every conjunction of clauses by all its universally 
quantified clauses 


Va4. aie VR. VV Li; 
j=l 


6. Finally, we can remove the universal quantifications in formulas if we sup- 
pose that all the universally quantified variables are distinct, see lemma 5.4.6.2 
below. 


The theory ©’ obtained in this way is equisatisfiable with 0. 
y 


Lemma 5.4.6.2. Given formula Vz.A and a theory 0, the theories 0 U {Vx.A} 
and © U {A} are equisatisfiable, provided that « ¢ FV(0). 


The resolution rule. We can assimilate a theory [ in clausal form with a first 
order context. The resolution rule of section 2.5.8 can then be modified as 
follows in order to account for first-order: 


TECV P(ti,..-,tn) TE >AP(w,...,Un) VD 
TE(CVD)[o] 


(res) 


where @ is the most general unifier of the equation system 
{ti uy,...,tn un} 


Generalizing lemma 2.5.8.1, this rule is correct: 


Lemma 5.4.6.3 (Correctness). If C can be deduced from I’ using the axiom and 
resolution rules then the sequent IF C is derivable in classical first-order logic. 


Example 5.4.6.4. The standard example is the following one. We know that 
— all men are mortal, and 
— Socrates is a man, 
which can be formalized as the theory 
{Vz.man(x) = mortal(x), man(Socrates) } 


in the signature with a constant symbol Socrates and with two unary predicates 
man and mortal. By proposition 5.4.6.1, we can put it in clausal form: 


{= man() V mortal(x), man(Socrates) } 
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We want to show that this entails that Socrates is mortal. As explained in 
lemma 2.5.8.8, this can be done using resolution by showing that adding 


— mortal(Socrates) 


to the theory makes it inconsistent. And indeed, writing I for the resulting 
theory, we have 


[TF >=man(x) V m(2) ce) 
T+ m(S) 


Te 2k: 


(we shortened mortal as m and Socrates as S). 


The factoring rule. As is, the resolution rule is not complete (see example 5.4.6.6 
below). We can however make the system complete by adding the following 
factoring rule 


TEKCV P(ti,..-,tn) V Plui,...,Un) 
TE (CV P(t,..-,tn))[o] 


(fac) 


where ¢ is the most general unifier of {t; #u1,...,tn #un}. With this rule, 
the completeness theorem 2.5.8.7 generalizes as follows: 

Theorem 5.4.6.5 (Refutation completeness). A set I’ of clauses is not satisfiable 
if and only if we can show [+ L using axiom, resolution and factoring rules 
rules only. 


Example 5.4.6.6. Given a unary predicate P, consider the theory 
P= {P(x) Vv Ply), P(x) V >P(y)} 


which is not satisfiable. The resolution rule only allows us to deduce the clauses 
P(a)V7P(«) and P(y)V7P(y), from which we cannot deduce any other clause: 
without factoring, the resolution rule is not complete. With factoring, we can 
show that T is inconsistent by 


(ax) 


TF P(x) V Ply) (fac) 


DE P(z) 


TE P(a) V Ply) x) 


TF P(a) 


T+ aP(x) V aP(y) 
[+ =P(y) 


Pea oe) 


Instead of adding the factoring rule, in order to gain refutation completeness 
the resolution rule can also be modified in order to unify multiple literals at 
once. 


CHAPTER 6 


Agda 


6.1 What is Agda? 


Agda is both a programming language and a proof assistant, originally devel- 
oped by Norell in 2007. On the surface, it resembles a standard functional 
programming language such as OCaml or Haskell. However, it was designed 
with the Curry-Howard correspondence in mind, see chapter 4, extended to a 
much richer logic than propositional or first-order logic: it uses dependent types, 
which will be the object of chapter 8. This means that the types can express 
pretty much any proposition as a type and a program can be considered as a 
way of proving such a proposition. In this sense the language can also be con- 
sidered as a proof assistant. We start by writing a type, which can be read as 
a formula, and gradually construct a program of this type, which can be read 
as a proof of the formula. The type checking algorithm of Agda will verify that 
the program actually admits the given type, i.e. that our proof is correct! 

A first introduction to Agda is given in sections 6.2 and 6.3, inductive types 
are presented in section 6.4 for data types and section 6.5 for logical connectives, 
we discuss the formalization of equality in section 6.6, the use of Agda to prove 
the correctness of programs in section 6.7 and the issues related to termination 
in section 6.8. 


6.1.1 Features of proof assistants. We shall first present some of the general 
features that Agda has or does not have. There is no room here for a detailed 
comparison with other proof assistants, the interested reader can find details in 
[Wie06] for instance. In passing, we will simply mention some difference with 
the main competitors, which are currently Coq and Lean, and operate similarly 
from our point of view. Other well-known proof assistants include ACL2, HOL 
Light, Isabelle, Mizar, PVS, etc. 


No type inference. A first difference with functional programming languages 
(e.g. OCaml) is that the typing is so rich in proof assistants that there are no 
principal types and typability is undecidable. There is thus very limited support 
for type inference and we have to explicitly provide a type for all functions. The 
more precise the type for a function is, the longer implementing the program will 
take, but the stronger the guarantees will be. For instance, a sorting algorithm 
can be given the type 


List A> List A 
as usual, but also the type 
List A + SortedList A 


i.e. the type expresses the fact that the output is a sorted list (the type of sorted 
lists can be defined in the language). The second type is much more precise than 
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the first one, and it will be more involved to define a function of the second than 
of first type (although not considerably so). 


Programs vs tactics. The Agda code looks pretty much like a program in a 
functional programming language. For instance, the proof of A x B > A is, as 
expected a program which takes a pair (a,b) and returns a: 


open import Data.Product 
postulate AB: Set 


proj :AxBoaA 
proj (a, b) =a 


which is easily compared with the corresponding definition in the OCaml toplevel 


# let proj (a , b) = a;; 
val proj : 'a * 'b -> 'a = <fun> 


On the contrary, Coq uses tactics which describe how to progress into the proof. 
The same proof in Coq would look like this: 


Variables A B : Prop. 


Theorem proj : (A * B) -> A. 
Proof. 

intro p. 

elim p. 

intro a. 

intro b. 

exact a. 

Qed. 


It is not clear at all that it is implementing a projection, but the correspondence 
with the proof in natural deduction is obvious. The tactics precisely correspond 
to the rules, when read from bottom-up: the intro commands correspond to 
introduction of => rules, elim to a variant of the usual elimination rule for A, 
and exact to the axiom rule: 


ARE Ae Bee 


p:ANB,a:AFBSA 

p:AN\BFASBS>A 
p:ANBEA 
FAABSA 


1) 
1) 
(Az) 


(=1) 


The difference between the two is mostly a matter of taste, both are quite con- 
venient to use and have the same expressive power. The reason we chose to use 
Agda in this course is that it makes more clear the Curry-Howard correspon- 
dence, which is one of the main objects of this course. 
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Automation. There is one main advantage of using tactics over programs how- 
ever: it allows more easily for automation, i.e. Coq can automatically build 
parts of the proofs for us. For instance, the previous example can be proved in 
essentially one line, which will automatically generate all the above steps: 


Variables A B : Prop. 


Theorem proj : (A * B) -> A. 
Proof. 

tauto. 

Qed. 


As a more convincing example, the following formula over integers 
Ym €ZNn €Z.(14+2xm) A (n+n) 


can also be proved in essentially one line: 


Require Import Coq. ZArith.ZArith. 
Require Import Coq.micromega.Lia. 
Global Open Scope Z_scope. 


Theorem thm : forall mn:Z, 1+2*m<>n+n. 
Proof. 

intros; lia. 

Qed. 


(the lia tactic tries to automatically solve goals in linear integer arithmetic). 
If we had to do it by hand, we would have needed many steps, using small 
intermediate lemmas expressing facts such asn-+n = 2 xn, etc. Agda has only 
very limited support for automation, although it has been progressing recently 
using reflection. 


Program extraction. A major feature of Coq is that the typing system allows to 
perform what is called program extraction: once the program is proved correct, 
one can extract the program (in OCaml) and forget about the parts which are 
present only to prove the correctness of the program. In contrast, the support 
for program extraction in Agda is less efficient and more experimental. 


Correctness. It might seem obvious, but let us state this anyway: a proof assis- 
tant should be correct, in the sense that when it accepts a proof then the proof 
should actually be correct. Otherwise, it would be very easy to write a proof 
assistant: 


let O = 
while true do 
let _ = read_line () in 


print_endline "Your proof is correct." 
done 
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We will see that sometimes the logic implemented in proof assistants is not 
consistent for very subtle reasons (for instance, in section 8.2.2): in this case, the 
program allows proving | and thus any formula, and thus essentially amounts 
to the above although it is not obvious at all. For modern and well-developed 
proof assistants, we however have good reasons to trust that this is not the case, 
see below. 


Small kernel. An important design point for a proof assistant is that it should 
have a small kernel, whose correctness ensures the correctness of the whole 
program: this is called the de Bruijn criterion. A proof assistant is made of a 
large number of lines of code (roughly 100 000 lines of Haskell for Agda and 
225 000 lines of OCaml for Coq), those lines are written by humans and there is 
always the possibility that there is a bug in the proof assistant. For this reason, 
it is desirable that the part of the software that we really have to trust, its 
“kernel”, which mainly consists in the typechecker, is as small as possible and 
isolated from the rest of the software, so that all the efforts to ensure correctness 
can be focused on this part. For instance, in Coq, a tactic can produce any proof 
in order to automate part of the reasoning: this is not really a problem because, 
in the end, the typechecker will ensure that the proof is correct, so that we do 
not have to trust the tactic. In Coq, the kernel is roughly 10% of the software; 
in Agda, the kernel is a bit larger, because it contains more features (dependent 
pattern matching in particular), which means that programming is easier in 
some aspects, but the trust that we have in the proof checker is a bit lower. 

In order to have a small kernel, it is desirable to reuse as much as possible 
existing features; this principle is followed by most proof assistants. For instance 
in OCaml, there is a type bool of booleans, but those could already have been 
implemented using inductive types by 


type bool = False | True 


This is reasonable in OCaml to have a dedicated type for performance reasons 
but, in a proof assistant, this would mean more code to trust which is a bad 
thing: if we can encode a feature in some already existing feature, this is good. 
In Agda, booleans are actually implemented as above: 


data Bool : Set where false true : Bool 
as well as in Coq: 


Inductive bool : Set := false : bool | true : bool. 


Bootstrapping. A nice idea in order to gain confidence in the proof checker would 
be to bootstrap and prove its correctness inside itself: OCaml is programmed in 
OCaml, why couldn’t we prove Agda in Agda? Gédel’s second incompleteness 
theorem unfortunately shows that this is impossible. However, a fair amount 
can be done, and has been done in the case of Coq [BW97]: the part which 
is out of reach is to show the termination of Coq programs inside Coq (we 
already faced a similar situation, in the simpler case of Peano arithmetic, see 
section 5.2.5). 
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Termination. A proof assistant should be able to decide, in finite amount of 
time, whether a proof is correct or not. In order to do so, it has to be able 
to check that a given function will produce a value. For this reason, all the 
functions that you can write in proof assistants such as Agda are total: they 
always produce a result in a finite amount of time. In order to ensure this, 
heavy restrictions are imposed on the programs which can be implemented in 
proof assistants. Firstly, since all functions are total, the language is not Turing- 
complete: there are some programs that you can write in usual programming 
languages that you will not be able to write in a proof assistant. Fortunately, 
those are rare and typically arise when trying to bootstrap as explained above. 
Secondly, since the problem of deciding whether a function terminates or not 
is undecidable, the proof assistant actually implements conditions which ensure 
that accepted programs will terminate, but some terminating programs will ac- 
tually get rejected for “no good reason”. These issues are detailed in section 6.8. 


6.1.2 Installation. In order to use Agda you will need two pieces of software: 
Agda itself and an editor which supports interacting with Agda. The recom- 
mended editor is Emacs. 


Under Linux. On Ubuntu or Debian, installing Agda and Emacs is achieved by 
typing 


sudo apt-get install agda emacs 


(installation under most other distributions should be similar, by using the 
adequate package manager). Alternatively, in order to obtain a cutting-edge 
version, you can install cabal and type 


cabal update 
cabal install Agda 
agda-mode setup 


to compile the latest version of Agda. 


VSCode. For people thinking that Emacs looks too old, a more modern-looking 
editor compatible with Agda is Visual Studio Code’, which is available for most 
platforms. In order to activate Agda support, you should also install the dedi- 
cated Agda mode?. 


Under macOS and Windows. The preferred installation procedure under macOS 


and Windows changes from time to time. The latest one can be found in the 
documentation®. 


6.2 Getting started with Agda 


6.2.1 Getting help. The first place to get started with Agda is the online 
documentation, which is quite well written: 


thttps://code.visualstudio.com/ 
*https://marketplace.visualstudio. com/items?itemName=banacorn. agda-mode 
Shttps://agda. readthedocs. io/en/latest/getting-started/installation.html 
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https: //agda.readthedocs.io/en/latest/ 


As usual you can also search on the web. In particular, there are also various 
forums such as Stackoverflow: 


https: //stackoverflow. com/questions/tagged/agda 


A very good introduction to Agda is [WK19]. 


6.2.2 Shortcuts. When writing a proof in Agda, we do not have to write the 
whole program directly: this would be almost impossible in practice. The editor 
allows us to leave “holes” in proofs (written ?) and provides us with shortcuts 
which can be used in order to fill those holes and refine programs. Below we 
provide the shortcuts for the most helpful ones, writing C-x for the control key 
+ the « key. They might seem a bit difficult to learn at first, but you will see 
that they are easy to get along, and we can live our whole Agda life with only 
six shortcuts. 


Emacs. We should first recall the main Emacs shortcuts: 


C-c C-s__ save file 


C-w cut 
M-w copy 
C-y paste 


Atom uses more standard ones. 


Agda. The main shortcuts for Agda that we will need are the following ones, 
their use is explained below in section 6.2.6. 


C-c C-l typecheck and highlight the current file 

C-c C-, get information about the hole under the cursor 
C-c C-space_ give a solution 

C-c C-c case analysis on a variable 

C-c C-r refine the hole 

C-c C-a automatic fill 


middle click definition of the term 


A complete list can be found in the online documentation. Shortcuts which are 
also sometimes useful are C-c C-. which is like C-c C-, but also shows the 
inferred type for the proposed term for a hole, and C-c C-n which normalizes a 
term (useful to test computations). 


Symbols. Agda allows for using fancy UTF-8 symbols: those are entered using 
\ (backslash) followed by the name of the symbol (many names are shared with 
LaTeX). Most of them can be found in the documentation. The most useful 
ones are for logic 


A \and 
Vv \or 


T \top 


A \GI1 
1 \bot = 


- \neg|a_ \ex x \Sigma \equiv 


+ \to i \all | \pPi 


and some other useful ones are 
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N  \bN 
w \uplus 


< \le 
m \qed 


x \times € \in 


Aes 


Indices and exponents such as in x; and x? are respectively typed \_1 and \*1, 
and similarly for other. 


6.2.3 The standard library. The standard library defines most of the ex- 
pected data types. The default path is /usr/share/agda-stdlib and you are 
encouraged to have a look in there or in the online documentation. We list 
below some of the most useful modules. 


Data types. The modules for common data types are: 


Data.Empty Empty type (1) 

Data.Unit Unit type (T) 

Data.Bool Booleans 

Data.Nat Natural numbers (N) 

Data.List Lists 

Data.Vec Vectors (lists of given length) 
Data.Fin Types with finite number of elements 


Other useful ones are : Data. Integer (integers), Data.Float (floating point 
numbers), Data.Bin (binary natural numbers), Data.Rational (rational num- 
bers), Data. String (strings), Data.Maybe (option types), Data.AVL (balanced 
binary search trees). 


Logic. Not much is defined in the core of the Agda language and most of the 
type constructors are also defined in the standard library: 


Data. Sum Sum types (¥, Vv) 
Data.Product Product types (*, A, 3, Z) 
Relation.Nullary Negation (-) 


Relation.Binary.PropositionalEquality Equality (=) 


Algebra. The standard library contains modules for useful algebraic structures 
in Algebra.*: monoids, rings, groups, lattices, etc. 


6.2.4 Hello world. A mandatory example is the “hello world” program, see 
section 1.1.1. We can of course write it in Agda: 


{-# OPTIONS --guardedness #-} 


open import Level 
open import IO 


main : IO {a = Qf} _ 
main = putStrLn "Hello, world!" 


We however only give it for fun here: you will very rarely write such a program. 
A more realistic example is detailed in next section. 
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6.2.5 Our first proof. As a first proof, let’s show that the propositional for- 
mula 
ANBSBAA 


is valid. By the Curry-Howard correspondence, we want a program of the type 
AxBoBxA 


showing that x is commutative. In OCaml, we would have typed 


# let prod_comm (a , b) = (b , a);; 
val prod_comm : 'a * 'b -> 'b * ‘a = <fun> 


The full proof in Agda goes as follows: 


open import Data.Product 


-- The product is commutative 
x-comm : (AB: Set) 7 (A x B) > (B x A) 
x-comm A B (a , b) = (b , a) 


We should try to explain the various things we see there. 

Importing modules. The programs in Agda, including the standard library, are 
organized in modules which are collections of functions dedicated to some fea- 
ture. Here, we want to use the product, and therefore we have to import the 


corresponding module, which is called Data.Product in the standard library. In 
order to do so, we use the open import command which loads all its functions. 


Comments. In Agda, comments start by -- (two minus dashes), as in the second 
line. 


Declaring functions. We are defining a function named x-comm. A function 
declaration always contains (at least) two lines. The first one is of the form 


name : type 


declaring that the function name will have the type type, and the second one is 
of the form 


name a; ... a, = value 


declaring that the function name takes arguments a; and returns a given value. 


Types. Let us detail the type 
(AB: Set) + (A x B) 4 (Bx A) 


we have given to the function. As indicated above, in OCaml the type would 
have been 


‘a x 'b -> 'bx* 'a 
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which means that for any types ’a and ’b the function can have the above type. 
In Agda, there is not such implicit universal quantification over types, which 
means that we have to do that by ourselves. We can do this because 


1. we have the special type Set which is “the type of types” (we have uni- 
verses), 


2. we have the ability to name the arguments in types and use them in further 
types (we have dependent types). 


The type of the function will thus read as: given arguments A and B of type Set 
(i.e. given types A and B), given a third argument of type A * B, we return a 
result of type B x A. The fact that the arguments A and B are grouped here is 
purely a syntactic convenience, and the above type is exactly the same as 


(A : Set) + (B : Set) + (Ax B) + (Bx A) 


Function definitions. The definition of the function is then the expected one 
x-comm A B (a , b) = (b , a) 


We take three arguments: A, B and a pair (a , b) and return the pair (b , a). 
Note that the fact that we can write (a , b) for the third argument is because 
Agda allows definitions by pattern matching (just as OCaml): here, the product 
has only one constructor, the pair. 


Spaces. A minor point, which is sometimes annoying at first, is that spaces 
for constructors are important: you have to write (a , b) and not (a, b) or 
(a,b). This is because the syntax of Agda is really extensible (the notation for 
pairings is not built in for instance, it is defined in Data.Product!), which comes 
with some induced limitations. A side effect of this convention is that a,b is 
a perfectly legit variable name (but it is not necessarily a good idea to make 
heavy use of this opportunity). 


Typesetting UTF-8 symbols. Since we want our proofs to look fancy, we have 
used some nice UTF-8 symbols: for instance “+” and “x”. In the editor, such 
symbols are typed by commands such as \to or \times as indicated above, in 
section 6.2.2. There are usually text replacements (e.g. we could have written 
-> and *), but those are not used much in Agda. 


6.2.6 Our first proof, step by step. The above proof is very short, so that 
we could have typed it at once and then made sure that it typechecks, but 
even for moderately sized proofs, it is out of the question to write them in one 
go. Fortunately, we can input those gradually, by leaving “holes” in the proofs 
which are refined later. Let us detail how one would have done this proof step 
by step, in order to introduce all the shortcuts. 

We begin by giving the type of the function and its declaration as 


x-comm : (AB: Set) 7 (A x B) > (B x A) 
x-comm A B p = ? 
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We know that our function takes three arguments (A, B and p), which is obvious 
from the type, but we did not think hard enough yet of the result so that we 
have written ? instead, which can be thought of as a “hole” in the proof. We 
can then typecheck the proof by typing C-c C-1. Basically, this makes sure that 
Agda is aware of what is in the editor (and report errors) so that you should 
use it whenever you have changed something in the file (outside a hole). Once 
we do that, the file is highlighted and changed to 


x-comm : (AB: Set) 7 (A x B) > (B x A) 
x-comm A B p = { }0 


The hole has been replaced by { }0, meaning that Agda is waiting for some 
term here (the @ is the number of the hole). Now, place your cursor in the hole. 
We can see the variables at our disposal (i.e. the context) by typing C-c C-,: 


Goal: Bx A 
p:AxB 
B: Set 
A: Set 


This is useful to know where we are exactly in the proof: here we want to prove 
B x A with A, B and p of given types. Now, we want to reason by case analysis 
on p. We therefore use the shortcut C-c C-c, Agda then asks for the variable 
on which we want to reason by case on, in this case we reply p (and enter). The 
file is then changed to 


x-comm : (AB: Set) 7 (A x B) > (B x A) 
x-comm A B (fst , snd) = { }0 


Since the type of p is a product, p must be a pair and therefore Agda changes p 
to the pattern (fst , snd). Since we do not like the default names given by 
Agda to the variables, we rename fst to a and snd to b: 


x-comm : (AB: Set) + (A x B) > (B x A) 
x-comm AB (a , b) = { }@ 


We should then do C-c C-1 so that Agda knows of this change (remember that 
we have to do it each time we modify something outside a hole). Now, we place 
our cursor into the hole. By the same reasoning, the hole has a product as a 
type, so that it must be a pair. We therefore use the command C-c C-r which 
“refines” the hole, i.e. introduces the constructor if there is only one possible for 
the given type. The file is then changed to 


x-comm : (AB: Set) + (A x B) 7 (B x A) 
x-comm AB (a , b) = { }1 , { }2 


The hole was changed in a pair of two holes. In the hole { }1, we know that 
the value should be b. We can therefore write b inside it and type C-c C-space 
to indicate that we have given the value to fill the hole: 


x-comm : (AB: Set) 7 (A x B) 7 (B x A) 
x-comm A B (a , b) =b, { }2 
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We could do the same for the second hole (by giving a), but we get bored: this 
hole is of type A so that the only possible value for it was a anyway. Agda is 
actually able to find that if we type C-c C-a, which is the command for letting 
the proof assistant try to automatically fill a hole: 


x-comm : (AB: Set) 7 (A x B) > (B x A) 
x-comm AB (a , b) =b, a 


6.2.7 Our first proof, again. We would like to point out that these steps ac- 
tually (secretly) correspond to constructing a proof. For simplicity, we suppose 
that A and B are two fixed types, this can be done by typing 


postulate AB: Set 
and consider the proof 


x-comm : (A x B) 7 (B x A) 
x-comm (a , b) =b,a 


which is a small variant of previous one. We now explain that constructing this 
proof corresponds to constructing a proof in sequent calculus. As a general rule: 


— doing a case split on a variable (C-c C-c) corresponds to performing a left 
rule (or an elimination rule in natural deduction), 


— refining a hole (C-c C-r) corresponds to performing a right rule (or a 
introduction rule in natural deduction), 


— providing a variable term (C-c C-space) corresponds to performing an 
axiom rule. 


In figure 6.1, we have shown how the steps of our proof in Agda translate 
into the construction of the proof from bottom up, in sequent calculus. Also 
note that there is a perfect correspondence with respect to the Curry-Howard 
correspondence if we allow ourselves to put patterns instead of variables in the 
context: 


(ax) (ax) 
a:A,b: Bro: B a:A,b:Bka:A ‘ 
a:A,b: BE (b,a): BAA oS 
(a,b): AN BE (ba): BAA 
R 


F X(a,b).(b,a): AABS> BAA 


This correspondence has some defects in general [Kri09], which is why we do 
not detail it further here. 


6.3 Basic agda 


In this section we present the main constructions which are present in the core 
of Agda, with the notable exception of inductive types which are described in 
sections 6.4 and 6.5. 


CHAPTER 6. AGDA 273 


Agda Shortcut Proof Rule 
? 


FAABS>BAA 
? 
AABFBAA 
FAABS>BAA 
? 
ABEBAA 
AABFBAA 


ce BY #40, "ee es A 
mie i ie ce" | PAABS BAA (AR) 


? ? 
ABEB A,BEA 
ABEBAA 
AABFBAA 


x= (a, b) = { }1 , { }2 | b C-c C-s ax 
ages FAABSBAA (a) 


? 


x-comm = { }0 C-c C-r (=r) 


x-comm p = { }0 C-c C-c p (AL) 


ABEKB A,BEA 
ABEBAA 
AABF BAA 


x- » b)=b, 2 G=c G= 
connie 92) so al ae FAABSBAA coe 


ABFB A,BEA 
ABEBAA 
AABEBAA 

FAABSBAA 


x-comm (a , b) =b,a 


Figure 6.1: Agda proofs and sequent proofs. 
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6.3.1 The type of types. In Agda, there is by default a type named Set, 
which can be thought of as the type of all types: an element of type Set is a 
type. 


6.3.2 Arrow types. In Agda, we have the possibility of forming function types: 
given types A and B, one can form the type 
A7B 


of functions taking an argument of type A and returning a value of type B. For 
instance, the function isEven which determines whether a natural number is 
boolean will be given the type 


isEven : N > Bool 
Type constructors. Functions in Agda can operate on types. For instance, the 
type of lists is a type constructor: it is a function which takes a type A as 


argument and produces a new type List A, the type of lists whose elements are 
of type A. We can thus give it the type 


List : Set + Set 
The type List A can also be seen as a type which is parametrized by another 
type, just as in OCaml the type ’a list of lists is parametrized by the type ’a. 
Named arguments. In Agda, we can give a name to the arguments in types, 
e.g. we can give the name x to A and consider the type 

(x : A) 7B 
For instance, the even function could also have been given the type 
isEven : (x : N) > Bool 


However, the added power comes from the fact that the type B is also allowed 
to make use of the variable x. For instance, the function which constructs a 
singleton list of some type can be given the following type (see section 6.3.3 for 
the full definition of this function): 


singleton : (A: Set) >A 7 List A 


Both the second argument and the result use the type A which is given as first 
argument. Such a type is called a dependent type: it can depend on a value, 
which is given as an argument. 


Universal quantification. Another way to read the type (x : A) 74 Bis asa uni- 
versal quantification: it corresponds to what we would have previously written 


Vc € A.B 


For instance, we can define the type of equalities between two elements of a 
given type A by 


eq: (A: Set) 7A7A > Set 
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and a proof that this equality is reflexive is given the type 
refl : (A: Set) + (x : A) 7 eqA xx 
which corresponds to the usual formula 


VAVa Ee Axcw=ax 


Implicit arguments. Sometimes, some arguments can be deduced from the type 
of other arguments. For instance, in the singleton function above, A is the type 
of the second argument. In this case, we can make the first argument implicit, 
which means that we will not have to write it and we will let Agda guess it 
instead. This is done by using curly brackets in the type 


singleton : {A : Set} 7 A 7 List A 
This allows us to simply write 
singleton 3 


and Agda will be able to find out that A has to be N, since this is the type of 
3. In case we want to specify the implicit argument, we have to use the same 
brackets: 


singleton {N} 3 


Another way of having Agda make a guess is to use _ which a placeholder that 
has to be filled automatically by Agda. For instance, we could try to let Agda 
guess the type of A (which is Set) by declaring 


singleton : {A : _} #7 A7 List A 

which can equivalently be written 

singleton : V {A} 7 A 7 List A 

6.3.3 Functions. As indicated in section 6.2.5, a function definition begins 
with a line specifying the type of the function, followed by the definition of the 
function itself. For instance, the singleton function which takes an element x of 


some arbitrary type A and returns the list with x as the only element, can be 
defined as 


singleton : (A: Set) A 7 List A 
singleton Ax =x: [] 


or as 


singleton : {A : Set} >A 7 List A 
singleton x =x : [] 


or as 


singleton : {A : Set} 7 A 7 List A 
singleton =Ax-7x =: [] 


In the second variant, A is an implicit argument. In the third variant, we 
illustrate the fact that we can use \-abstractions to define anonymous functions. 
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Infiz notations. In function names, underscores (_) are handled as places where 
the arguments should be put, which allows to easily define infix operators. For 
instance, we can define the addition with type 


_t_:NaANON 

and then use it as 

3 +2 

The prefix notation is still available though, in case it is needed: 

JHE E82 

The priorities of binary operators can be specified by commands such as 
infix 6 _t+_ 


which states that the priority of addition is 6 (the higher the number, the 
higher the priority). Operations can also be specified to be left (resp. right) 
associative by replacing infix by infixl (resp. infixr). For instance, addition 
and multiplications are usually given priorities 


infixl 6 _+_ 
infixl 7 _*_ 


so that the expression 
2+3+5%* 2 


is implicitly bracketed as 
(2-2 Bp Ak 82) 


Auailiary functions. In the definition of a function, it is possible to use auxiliary 
function definitions using the where keyword. For instance, we can define the 
function f which computes the fourth power of a natural number, i.e. f(x) = x+, 
by using the square function as an auxiliary function, ie. f(x) = (x?)?, as 


follows: 


fourth : N7N 

fourth n = square (square n) 
where 
square : N7N 
square n=nx*n 


Here, we define the fourth function in terms of the auxiliary function square, 
which is defined afterwards, preceded by the where keyword. 


6.3.4 Postulates. It rarely happens that we need to assume the existence of 
a term of a given type without any hope of proving it: this is typically the case 
for axioms. This can be achieved by the postulate keyword. For instance, in 
order to work in classical logic, we can assume the law of excluded middle with 


postulate lem: (A: Set) 7 > AWA 


These should be avoided as much as possible because postulates will not com- 
pute: if we apply lem to an actual type A, it will not reduce to either - A or A, 
as we would expect for a coproduct, see section 6.5.6: how could Agda possibly 
know which one is the right one? 
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6.3.5 Records. Records in Agda are pretty similar to those in other language 
(e.g. OCaml) and will not be used much here. In order to illustrate the syntax, 
we provide here an implementation of pairs using records: 


record Pair (A B: Set) : Set where 


field 
fst : A 
snd : B 


make-pair : {AB : Set} 7A 7B-> Pair AB 
make-pair a b = record { fst = a; snd = b } 


proj; : {AB : Set} + Pair ABA 
proj; p = Pair.fst p 


6.3.6 Modules. A module is a collection of functions. It can be declared by 
putting 


module Name where 


at the beginning of the file, where Name is the name of the module and should 
match the name of the file. The functions of another module can be used by 
issuing the command 


open import Name 


which will expose all the functions of the module Name. After this command, 
the modifiers hiding (...) or renaming (... to ...) can be used in order to 
hide or rename some of the functions. 


6.4 Inductive types: data 


Inductive types are the main way of defining new types in Agda. Apart from 
a few exceptions (such as + and Set mentioned above), all the usual types are 
defined using this mechanism in the standard library, including usual data types 
and logical connectives; we first focus on data types in this section. An inductive 
type T is declared using a statement of the form 


data T : A where 
cons; : A, 7... 7A AT 


cons, : By 7... 4 Bj 7 T 


which declares that T is an inductive type, whose type is A, with constructors 
cons;,..., cons,. For each constructor, the line begins with two blank spaces, 
followed by the name of the constructor, and ends with the type of the con- 
structor. Each constructor takes an arbitrary number of arguments and has T 
as return type. Since the type T we are defining is itself a type, A is usually Set, 
although some more general inductive types are supported (for instance, they 
can depend on some other types, see section 6.4.7). 
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6.4.1 Natural numbers. As a first example, the natural numbers are defined 
as the inductive type N in the module Data.Nat by 


data N : Set where 
zero : N 
suc :N-AN 


The first constructor is zero, which does not take any argument, and the second 
constructor is suc, which takes a natural number as argument. A value of type N 
is 


zero suc zero suc (suc zero) suc (suc (suc zero)) 


and so on. As a convenience, the usual notation for natural numbers is also 
supported and we can write 2 as a shorthand for suc (suc zero). 


6.4.2 Pattern matching. The way one typically uses elements of an inductive 
type is by pattern matching: it allows inspecting a value of an inductive type 
and return a result depending on the constructor of the value. As explained 
above, the cases are usually generated by using the C-c C-c shortcut which 
instructs the editor to perform case analysis on some variable. For instance, in 
order to define the predecessor function, we start with 


pred: NN 
pred n=? 


then, by C-c C-c we indicate that we want to reason by case analysis on n, 
which turns the code into 


pred: N7N 
pred zero 
pred (suc n) 


? 
? 


We now have to give the result of the function when the argument is zero (by 
convention the predecessor of 0 is 0) and when the argument is suc n, where 
nis a natural number. We can finally fill in the holes in order to define the 
predecessor: 


pred : N7N 
pred zero = zero 
pred (suc n) n 


Of course, pattern matching also works with multiple arguments and we can 
define addition by 


_t_:NA4ANAN 
zero +n=n 
suc m + n = suc (m + n) 


This definition can be tested by defining 
t=3+2 


and use C-c C-n to normalize t (which give 5 as answer). Subtraction can be 
defined similarly by 
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—-_:N7ANON 
zero - n = zero 
suc m - zero = suc m 


suc m- sucn=m-n 


(by convention m — n = 0 when m < n) and multiplication by 


_x_ :NANAN 
zero * n = zero 
sucm*x n= (mx*n)tn 


Matching with other values. It is sometimes useful to define a function by case 
analysis on a value which is not an argument. In this case, we can use the 
with keyword followed by the value we want to match on. This value can then 
be matched as an extra argument, which has to be separated from the other 
argument by a symbol |. For instance, the modulo function on natural numbers 
can be defined by induction on the second argument by the following definition: 


m ifm<n 
mmodn = 
(m—n)modn_ otherwise 


Here, we do not want to reason directly by induction on n, which would force 
us to distinguish the case where n is zero or a successor, but rather on whether 
m <n holds or not. This can be achieved by matching on m <? n which will 
either be yes _ or no _ depending on whether m <n or m ¢ n (the arguments 
of those constructors are not important for the moment and will be detailed in 
section 6.5.6). 

We begin our definition as usual with 


_mod_ : N2ANON 
m mod n = ? 


Since we want to match on m <? n, we use the with keyword in order to match 
on it additionally to the arguments: 


_mod_ : N27NON 
m mod n with m <? n 
mmodn |p=? 


and we can then reason by case analysis on p. Incidentally, we can avoid typing 
again the match on the arguments of the function by simply writing “...”: 


_mod_: N7*7N-N 
m mod n with m <? n 
| p=? 


At this point, we reason by case analysis on p (with C-c C-c p) which will 
produce two cases depending on the value of p: 


_mod_ : N+zN->N 

m mod n with m <? n 
mmodn | yes _ = 
m mod n | no 


ll 
NON 
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We can finally fill those two cases, as indicated by the above formula: 


_mod_: N+ NAN 

m mod n with m <? n 

mmodn | yes __=m 

m mod n | no (m + n) mod n 


As a side note, if you actually try the above definition in Agda, you will see that 
it gets rejected because it is not clear for Agda that it is actually terminating. 
The actual definition is slightly more involved because of this, see section 6.8. 


Empty pattern matching. Some inductive types do not have any element. For 
instance, we can define the empty type 1 as 


data 1 : Set where 


(this is an inductive type with no elements). When performing pattern matching 
on elements of this type there can be no match. In order to represent this, Agda 
uses the pattern (), which means that no such pattern can happen. For instance, 
one can show that if we have an element of type 1 then we have an element of 
an arbitrary type A as follows: 


i-elim: {A : Set} 7>L7A 
l-elim (~ 


Of course, since the type A is arbitrary, there is no way for us in the proof to 
actually exhibit a term of this type. But we do not have to: the pattern () 
states that there are no cases to handle when matching on the argument of 
type 1, so that we are done. 

It might seem at first that this is not so useful, unless one insists on using 
the type 1 (which is actually done quite often since negation is defined using it 
as you can expect). This is not so because there are many less obvious ways of 
constructing empty inductive types in Agda. For instance, the type zero = suc 
zero of equalities between 0 and 1 is also an empty inductive type. 


Anonymous pattern matching. Anonymous functions can be defined by pattern 
matching, although the syntax is slightly different from what one would expect: 
we need to put curly brackets before the arguments, and cases are separated by 
semicolons: 


Re KOE exe Be tte” 
For instance, the predecessor can be defined as an anonymous function by 


pred: NAN 
pred = A { zero ~ zero ; (suc n) 7 n } 


6.4.3 The induction principle. We would now like to briefly mention that 
pattern matching in Agda corresponds to the presence of a recursion princi- 
ple (for non-dependent functions) or of an induction principle (for dependent 
functions). 

For instance, if we define a function f from natural numbers to some type A, 
we will typically define it using pattern matching by 
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f: NA 
f zero 
f (suc n) u' 


where t and u’ are terms of type A. Here, u’ might make use of the natural 
number n provided as the argument, as well as the result of the recursive call 
f n: we can suppose that u’ is of the form u n (f n) for some function u of type 
N + A- A. Any such terms t and u will give rise to a function of type N 7 A 
in this way, and the recursion principle expresses this through a function which 
takes two arguments (of type Aand N + A @ A, respectively corresponding to t 
and u) and produces the resulting function: 


rec : {A : Set} 7>A7(NA7AZFA)DANAA 
rec t u zero =t 
rec t u (suc n) = un (rec t un) 


This is precisely the recursor we have already met when adding natural numbers 
to simply typed A-calculus in section 4.3.6. Moreover, any function of type N 
+ A defined using pattern matching can be defined using this function instead: 
this recursion function encapsulates all the expressive power of pattern matching 
that can be used in order to define non-dependent functions on natural numbers. 
For instance, the predecessor function would be defined as 


pred: N7N 
pred = rec zero (A n_ > n) 


From a logical point of view, the recursion principle corresponds to the elimi- 
nation rule: for this reason it is also sometimes also called an eliminator. 

Pattern matching in Agda is more powerful than this however: it can also be 
used in order to define functions where the return type depends on the argument. 
This means that we now consider functions of the form 


f :(n:N)7P on 
f zero =t 
f (suc n) = un (Cf n) 


where P n is a type which depends on n, or equivalently P is a predicate, of 
type N 7 Set: here, t is of type P zero and un (f n) is of type P (suc n). 
The corresponding dependent variant of the recursion principle is called the 
induction principle and is the following one: 


rec : (P_ : N7> Set) 7 P zero 7 

(cn : N) 7 Pn-7P (suc n)) 7 (n: N) APO 
rec P Pz Ps zero = Pz 
rec P Pz Ps (suc n) Ps n (rec P Pz Ps n) 


Given 
— a predicate P, 
— an element t of P zero, and 


— a function u of type N + P n > P (suc n), 
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this function allows us to construct a function of type (n : N) 7 P n. If, fol- 
lowing the Curry-Howard correspondence, we read the type as a logical formula 
(see section 6.5), we precisely recover the usual induction principle over natural 
numbers: 


P(0) > (Vn EN.P(n) > P(n+1)) > Vn € N.P(n) 


For instance, the following proof by induction that n +0 =n for every natural 
number n 


+-zero : (n : N) 7 n+ zero=n 
+-zero zero = refl 
+-zero (suc n) = cong suc (+-zero n) 


can be expressed as follows using the induction principle: 

+-zero : (n: N) *n+ zero=n 

+-zero = rec (A nn + zero =n) refl (A np > cong suc p) 
6.4.4 Booleans. The type of booleans is defined in Data.Bool by 


data Bool : Set where 
false : Bool 
true : Bool 


so that, for instance, boolean negation is defined by 


neg : Bool ~ Bool 
neg false = true 
neg true = false 


and conjunction by 


_A_ : Bool - Bool ~ Bool 


false A _ = false 
true <A false = false 
true <A true = true 


In Agda, even conditional branchings are defined by pattern matching: 


if_then_else_ : {A : Set} > Bool #A FAFA 
if false then x else y = x 
if true then x else y = y 


Finally, the induction principle for booleans is 


Bool-rec : (P : Bool > Set) + P false + P true > 
(b : Bool) + P b 

Bool-rec P Pf Pt false = Pf 

Bool-rec P Pf Pt true = Pt 
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6.4.5 Lists. Lists are defined in Data.List by 


data List (A : Set) : Set where 
[] : List A 
: Aw List Aw List A 


Here, “::” is one UTF-8 symbol (entered with \::) and not two colons. As 
indicated above, the type List depends on another type A, called the parameter 
of the inductive type. The resulting type is thus called parametric type or a type 
constructor. The usual functions are defined as usual by induction, for instance, 
we can define the function which associates to a list its length by 


length : {A : Set} + List A7N 
length [] = 0 
length (x : 1) = suc (length 1) 


the function which maps a function to every element of a list by 


map : {AB : Set} + (A 7B) 7 List A ~ List B 
map f [J] =(] 
map f (x: 1) =f x : map f 1 


or the function which concatenates two lists by 


_t+_ : {A : Set} + List A 7 List A > List A 
(1 t+ 1's! 
(xn 1) t+ 1' =x (1 t+ 1') 


Finally, the induction principle for lists is: 


List-rec : {A : Set} + (P : List A 7 Set) > P [] > 
C(x : A) > (xs : List A) 7 P xs 7 P (x :: xs)) 7 
(xs : List A) 7 P xs 
List-rec P Pe Pc [] = Pe 
List-rec P Pe Pc (x :: xs) = Pc x xs (List-rec P Pe Pc xs) 


6.4.6 Options. Option types are defined in Data.Maybe by 


data Maybe (A : Set) : Set where 
just : A + Maybe A 
nothing : Maybe A 


A value of this type is thus either nothing or just x for some value x of type A. 
The type Maybe A can thus be seen as the type A extended with one new value, 
nothing (it corresponds to option types of OCaml, see section 1.3.4). It is 
often useful in order to accommodate for exceptional values (where we would 
use “NULL pointers” in other languages). For instance, the function returning 
the head of a list is not defined when the list is empty. It can be given the 
following definition: 


head : {A : Set} + List A + Maybe A 


head [] nothing 
head (x :: 1) = just x 


This function is a bit cumbersome to use: each time we have to test whether the 
result is nothing or not (monads [Mog91] might be of some help here though). 
A more elegant solution is provided below. 
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6.4.7 Vectors. A vector is a list of given length. The type of vectors is defined 
in Data.Vec by 


data Vec (A : Set) : N + Set where 
[] : Vec A zero 
uno: {n : N} 47 A 7 Vec An Vec A (suc n) 


An element of Vec A n can be seen as a list whose elements are of type A and 
whose length is n. In this type, we thus have both a parameter A of type Set 
and an index of type N, corresponding to the length of the vector, indicated 
by the fact that the return type is N + Set. Indices are roughly the same as 
parameters, except that they can vary with constructors, as seen above: the 
constructor [] produces a vector of length zero, whereas the constructor _::_ a 
list of length suc n. It is “pure coincidence” if the names of the constructors 
are the same as for lists: they have nothing to do with those and could have 
been named differently (however, people chose to name them in the same way 
because vectors are usually used as a replacement for lists). 


Dependent types. It should be observed that the type Vec A nof vectors depends 
on a term n, the natural number indicating its length: this is a defining feature 
of dependent types. We can also define functions such that the type of the 
result depends on the argument. For instance, we have the following function, 
building a vector containing n occurrences of a given value: 


replicate : {A : Set} 7A 7 (n: N) 7 VecAn 
replicate x zero = [J 
replicate x (suc n) = x :: replicate x n 


Dependent pattern matching. Another natural function on this type is the func- 
tion returning the head of the list: 


head : {n : N} {A : Set} + Vec A (suc n) 7 A 
head (x :: xS) = x 


This is a good illustration of the dependent pattern matching present in Agda. 
Since the argument is a list of type Vec A (suc n), Agda automatically infers 
that this function will never be applied to an empty list, because it cannot have 
such a type, thus avoiding the problem we had when defining the same function 
on lists in section 6.4.6. 


Convertibility. Even though the type is more informative than the one of lists, 
typical functions are not significantly harder to write. For instance, the con- 
catenation of vectors is comparable to the one of lists: 


_t+_ : {mn : N} {A : Set} + Vec Am- Vec A n- Vec A (m + n) 
C] t+] =1 
(xn ltt l'=xs (ld tt+l') 


Looking closely at the first case of the pattern matching, we can note that the 
result 1 we are providing is of type Vec A n whereas the type of the function 
indicates that we should provide a result of type Vec A (zero + n). This illus- 
trates the fact that Agda is able to compare types up to 6-reduction on terms 
(zero + n reduces to n): we can never distinguish between two 6-convertible 
terms. 
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Induction principle. The induction principle for vectors is: 


Vec-rec : {A : Set} + (P_: {n : N} > Vec An > Set) +P [] 7 
({n : N} (x : A) (xs : Vec An) 7 P xs 7 P (X :: xS)) 7 
{n : N} > (xs : Vec An) > P xs 

Vec-rec P Pe Pc [] = Pe 

Vec-rec P Pe Pc (x :: xs) = Pc x xs (Vec-rec P Pe Pc xs) 


Indices instead of parameters. In the definition of vectors, we could have used 
an index instead of a parameter for the type A: 


data Vec : Set + N ~ Set where 
[] : {A : Set} + Vec A zero 
uno: {A : Set} {n : N} (x : A) (xs : Vec An) 7 Vec A (suc n) 


This is a general fact: we can always encode a parameter as an index. However, 
it is recommended to use parameters whenever possible, because Agda handles 
them more efficiently. 

Also, with the above definition, the induction principle is slightly different: 


Vec-rec : (P : {A : Set} {n : N} + Vec An - Set) 7 
(tA- se Setp PAD: 
({A : Set} {n : N} (x : A) (xs : Vec An) 7 P xs 7 P (x :: xS)) > 
{A : Set} > {n : N} 7 (xs : Vec An) 7 P xs 

Vec-rec P Pe Pc [] = Pe 

Vec-rec P Pe Pc (x :: xs) = Pc x xs (Vec-rec P Pe Pc xs) 


6.4.8 Finite sets. In section 6.4.4, we have defined the set of booleans, which 
contains two elements, and clearly we could have defined a set with n elements 
for any fixed natural number n. For instance, the following type has four ele- 
ments: 


data Four : Set where 


a : Four 
b : Four 
c : Four 
d : Four 


In fact, we can define, once for all, a type Fin n which depends on a natural 
number n and has n elements. The definition is done in Data. Fin by 


data Fin : N + Set where 
zero : {n : N} 7 Fin (suc n) 
suc : {n : N} 7 Fin n > Fin (suc n) 


Looking at it, we can see that Fin n is essentially the collection of natural 


numbers restricted to 
Fin n= {0,...,n—1} 


Namely, the above inductive type corresponds to the following inductive set- 

theoretic definition: 

Fin 0= 
) = 


0 
Fin (n+1)={O}U{t+1|ieFin n} 
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As for vectors, the fact that the constructors have the same name as for natural 
numbers is “pure coincidence”: the elements of Fin n are not elements of N, 
although there is obviously a canonical mapping: 


toN : {n : N} 7 Finn-7>N 
toN zero = zero 
toN (suc i) = suc (toN i) 


Some black magic in Agda allows it to determine, using types, whether we are 
using the constructors of Fin or those of N. 


The lookup function. The type Fin n is typically used to index elements over 
finite sets. For instance, consider the lookup function, which returns the i-th 
element of a vector | of length n. Clearly, this function is only well defined when 
i <n, i.e. when 2 belongs to Fin n. We can define this function as follows: 


lookup : {n : N} {A .: Set} + Finn 7+ VecAn-A 
lookup zero (x : 1) = x 
lookup (suc i) (x :: 1) = lookup i 1 


The typing ensures that the index will always be such that the function is well- 
defined, i.e. that we will never request an element outside the boundaries of the 
vector. 

Let us present other possible implementations of this function, using natural 
numbers as the type of i, in order to show that they are more involved and 
less elegant. Since the function is not defined for every natural number 7, a 
first possibility would be to have a return value of type Maybe A, where nothing 
would indicate that the function is not defined: 


lookup : N 7 {A : Set} {n : N} + Vec A n + Maybe A 


lookup zero CJ = nothing 
lookup zero (x : 1) = just x 
lookup (suc i) [] = nothing 


lookup (suc i) (x :: 1) = lookup i 1 


This is quite heavy to use in practice, because we have to account for the 
possibility that the function is not defined each time we use it. Another option 
could be to add as argument a proof of i < n, ensuring that the index is not 
out of bounds. This is more acceptable in practice, but the definition is not as 
direct as the one above: 


lookup : {i n : N} {A : Set} >i <n-7VecAn-A 

lookup {i} {.0} © C1 

lookup {zero} {.(suc _)} i<xn (x : 1) =x 

lookup {suc i} {.(suc _)} i<n (x :: 1) lookup (<-pred i<n) 1 


6.4.9 Integers. The type of integers can be defined essentially by taking two 
copies of the natural numbers: one corresponding to the positive integers and 
the other to the negative integers. If we proceed in this way, we however have 
two representations of zero (as 0 or —0), which should be identified. In order to 
avoid this problem, one of the two copies (here, the negative integers) is shifted 
by one. We thus define the type of integers as 
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data Z : Set where 
pos : N-7Z 
negsuc : N-7 Z 


The encoding of 0 is pos 0, 3 is pos 3 and —5 is negsuc 1 (note the shift by 
one). The successor function suc is defined by induction by 


suc: Z7Z 

suc (pos n) = pos (N.suc n) 
suc (negsuc N.zero) pos Q 

suc (negsuc (N.suc n)) = negsuc n 


and the predecessor pred is defined similarly. Finally, addition can be imple- 
mented using those by 


+_:Z272Z-32 


pos N.zero t+ne=n 

pos (N.suc m) + n = suc (pos m + n) 
negsuc N.zero +n = pred n 

negsuc (N.suc m) + n = pred (negsuc m + n) 


6.5 Inductive types: logic 


We have seen that inductive types can be used in order to implement usual data 
types (and more). We now explain that they can also be used to implement usual 
constructions on the logical side: though the Curry-Howard correspondence, 
types can be read as logical formulas (and a program of a given type as a proof 
of its type). We establish the translation between the two in this section. In this 
way, Agda provides a formal framework in which proofs can be formalized, as 
hinted in section 1.5, and we will see that is much richer than the simply-typed 
A-calculus presented in chapter 4. 


6.5.1 Implication. The first logical connective we have at our disposal is im- 
plication, which corresponds to the arrow ~ in types. For instance, the classical 
formulas 


A=>B=A (A=SBsC)s(A=SB)=AzSC 


can respectively be proved by 


K : {AB : Set} 7=AVABA 


S: {ABC : Set} 7 (A*7B7C) 7 CA7B) FAC 
Sgfx=gx (f x) 
6.5.2 Product. Products are defined in Data.Product by 


data _x_ (A B : Set) : Set where 
_-~:A7?7BwAxB 
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The first projection is defined as 


proj, : {AB : Set} 7#AxBoA 

proji (a, b) =a 

and the second projection is defined similarly. The projections are named proj, 
and proj2 in the standard library, even though we sometimes like to rename 
them as fst and snd. From a logical point of view, a product corresponds to 
conjunction. For instance, a proof of the formula AA B > BAA, expressing the 
commutativity of conjunction, was given in section 6.2.5. As another example, 
currying (see section 4.3.1) can be shown by 


x-4 : {A BC: Set} 7 (A*B27C) 7 (A7BA C) 
Met TN SP Kf. ¥) 
and 


a-x : {A BC: Set} 7 (A*7B7C) 7 (A * BAC) 
PRE OG, f 9) SOE RY 


Introduction rule. It can be observed that the constructor corresponds to the 
introduction rule for conjunction: 


TFA ThrB 


TrAaB 


This is a general fact: when defining logical connectives with inductive types, 
constructors correspond to introduction rules. We see below that the elimination 
rule corresponds to the associated induction principle. 


Induction principle. The induction principle for products is 


x-ind : {AB : Set} + (P: Ax B- Set) 3 
C(x: ADF Gy: BY FPO, y)) 7 (i A*xB) FPP 
x-ind P Pp (x , y) = Pp xy 


In the case where P does not depend on its argument, the above induction 
principle implies the following simpler principle 


x-rec : {AB : Set} + (P_: Set) 7 (A®7B7P)7A*xB-P 
x-rec P Pp (x , y) = Pp x y 


which corresponds to the elimination rule for conjunction: 


T,4,B-P TFAAB 
TEP 


(Az) 


Namely, it states that if the premises are true then the conclusion is also true. 
The dependent induction principle corresponds to the elimination rule in de- 
pendent types, as we will see in section 8.3.3. 


CHAPTER 6. AGDA 289 


6.5.3 Unit type. The unit type is defined in Data.Unit by 


data T : Set where 
tt : T 


From a logical point of view, the type corresponds to truth and the constructor 


to the introduction rule 
rome Ey) 
Tsk: 


Its induction principle is 


T-rec : (P: T7 Set) 7P tt 7 (t:T)7Pt 
T-rec P Ptt tt = Ptt 


We know from logic that there is no elimination rule associated to truth. We 
can however write the rule which corresponds to this induction principle: 


TRAE. Peo 


Tee (Te) 


This is not very interesting from a logical point of view: if we know that P holds 
and T holds then we can deduce that P holds, which we already knew. 

6.5.4 Empty type. The empty type is defined in Data.Empty by 

data 1 : Set where 


and corresponds to falsity. It has no constructor, thus no introduction rule. The 
associated induction principle is 


l-elim : (P: L 7 Set) 7 (x: 1) 7 Px 
i-elim P () 


The non-dependent variant of this principle 


i-elim : (P_: Set) 7#17P 
l-elim P () 


corresponds to the explosion principle, which is the associated elimination rule 


We recall from section 6.4.2 that () is the empty pattern in Agda, which in- 
dicates here that there are no cases to handle when matching on a value of 
type Ll. 


6.5.5 Negation. As expected, negation is defined in Relation.Nullary by 


- : Set 7 Set 
5s A=APL 


For instance, the formula A = ——A can be proved with 


nni: {A : Set} 7 A 7 > (- A) 
nni x f = f x 
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6.5.6 Coproduct. Coproducts (or sums) are defined in Data. Sum by 


data _w_ (A: Set) (B : Set) : Set where 
inj, : AA AUB 
injo : BA AWB 


The constructor inj; (resp. inj2) is called the injection of A (resp. B) into 
A vw B. The notation comes from the fact that the coproduct A & B corresponds 
to the disjoint unions if we see the types A and B as sets. Logically, coproduct 
corresponds to disjunction. As an illustration, the commutativity of disjunction 
is shown by 


w-comm : (AB: Set) 7AYUBABUWA 
w-comm A B (inj; x) = inj2 x 
w-comm A B (inj2 y) = inj; y 


As a more involved example, a proof of 
(AVAA)S37ASA 
is, following the proof of theorem 2.5.1.1, 


lem-raa : {A : Set} >#AWU7A72 (A AFA 
lem-raa (inj; a) k=a 
lem-raa (inj, a') k = L-elim (k a') 


The induction principle is 


w-rec : {AB : Set} 7 (P: AW Bo Set) > 
CO ee Age =P Cra: x) COP Boe P Cains Wy). ys 
(u: AWB) 7 Pu 

w-rec P P; Pz (inj; x) = Py x 

w-rec P P; Pz (inj2 y) = P2 y 

The two constructors correspond to the two introduction rules 


TFA TrB 


a hy, aa 
TrAvB © TRAvB 
and the non-dependent variant of the induction principle to the elimination rule 


TAFP T,BEP TFAVB 
TEP 


(Ve) 


Decidable types. A type A is decidable when we know whether it is inhabited or 
not, i.e. we have a proof of AV —A. We could thus define the predicate 


Dec : Set + Set 
Dec A=AUW-rA 


A proof of Dec Ais a proof that A is decidable: by definition of the disjunction, it 
is either of the form inj; p, where p isa proof of A, or inj2 gq, where q is a proof 
of + A. Agda people like to write yes (resp. no) instead of inj; (resp. inja), 
because it answers the question: is A provable? In the standard library, the 
above type is thus actually defined in the module Relation.Nullary as follows: 
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data Dec (A : Set) : Set where 
yes : A 7 Dec A 
no :-7A-+DecA 


Since the logic of Agda is intuitionistic, the formula A V =A is not provable for 
any type A, and not every type is decidable. However, it can be proved that no 
type is not decidable, see section 2.3.5: 


nndec : (A: Set) + 7+ (- (Dec A)) 
nndec An =n (no (A an (yes a))) 


This is further discussed in section 6.6.8. 
6.5.7 Il-types. A defining feature of Agda is that it uses dependent types: a 


type can depend on a term. As we will see, some of the connectives admit 
dependent generalizations. The first one is the generalization of function types 


A7B 
to dependent function types 
(x : A) 7B 


where x might occur in B. These model functions where the type B of the returned 
value depends on the argument x. A typical example is the replicate function 
of section 6.4.7, which takes a natural number n as argument and returns a 
vector of length n. Its type is the dependent function type 


replicate : {A : Set} 7 A 7 (n: N) > Vec An 
Dependent function types are also called I-types, and often written 
II(a: A).B 
instead of using the above notation. Although there is a built-in notation in 
Agda, one can define an inductive type for those by 


data II (A : Set) (B : A ~ Set) : Set where 
A: (Ca: A) *7Ba)70IAB 


Namely, an element of the II type II A B is simply a dependent function 
(x : A) 7B x 


From a logical point of view, it corresponds to a universal quantification which 
is bounded (we specify the type A over which the variable ranges): the above 
type corresponds to the logical formula 


Va € A.B(a) 


and proof of such a formula corresponds to a function which to every x in A 
associates a proof of B(x). This is why Agda also allows the notation 


Vx7Bx 


for the above type, if one is inclined to leave A implicit. 
Exercise 6.5.7.1. Show that the type 


II Bool (A { false > A ; true > B }) 


is isomorphic to A * B. 
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6.5.8 b-types. “-types are a dependent variant of product types, whose ele- 
ments are of the form a , b where a is of type A and b is of type B a: the type 
of the second component depends on the first component. They are defined in 
Data.Product by 


data £ (A: Set) (B : A > Set) : Set where 
_,.: (a: A)7BawrXAB 


(for technical reasons the actual definition in Agda is done using a record, but 
is equivalent to the above one). As for usual products, we can define two pro- 
jections by 


proj; : {A : Set} {B : A7 Set} 7>LABOA 
proj; (a, b) =a 


and 


proj2 : {A : Set} {B : A 7 Set} + (s : 2 A B) 7B (proj, s) 
proj2 (a, b) =b 


Again, in the second projection, note that the returned type depends on the 
first component. 
Logical interpretation. From a logical point of view, the type 

XAB 


can be read as a bounded existential quantification and corresponds to what 
one would usually write 
da € A.B(ax) 


A proof of such a formula is a pair consisting of an element x of A and a 
proof that x satisfies B(x). In a set theoretic interpretation, it corresponds to 
constructing sets by comprehension, i.e. the set of elements x of A such that 
B(a) is satisfied, what we would usually write 


{x € A | B(x)} 


For instance, in set theory, given a function f : A > B (froma set A toa set B), 
its image Im(f) is the subset of B consisting of elements in the image of f. It 
is formally defined as 


Im(f) = {y € B | 3x € A.f(x) = y} 


This immediately translates as a definition in Agda, with two © types (one for 
the comprehension and one for the universal quantification): 


Im: {A B: Set} (f : A 7B) > Set 
Im {A} {B} fF =XBAyrLA Axrf x zB y)) 


and one can for instance show that every function f : A > B has a right inverse 
(or section) g:Im(f) > A: 


sec : {AB : Set} (f : A7B) 7 Imf 7A 
sec f (y , xX, p) =x 
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The axiom of choice. In a similar vein, the axiom of choice states that for every 
relation R C A x B satisfying 


Ve € Ady € B.(z,y) eR (6.1) 


there is a function f : A > B such that 
Va € A(x, f(x)) ER 


In section 6.5.9 below, we define a type Rel A B corresponding to relations 
between two types A and B, from which we can easily write a type corresponding 
to the axiom of choice. What might be a surprise to you is that this axiom is 
actually provable in Agda: 


AC : {AB : Set} (R.: Rel AB) 3 
(x: A) FEB AYARX y)) 4 
xX (A 7B) AfAVXAR xX Cf x)) 
AC R f = (A x > proj, (f x)) , (A x > projo (Cf x)) 


The reason is that the argument, which corresponds to the proof of (6.1), is 
constructive: it is a function which to every element x of type A associates 
a pair consisting of an element y of B and a proof that (x,y) belongs to the 
relation. By projecting it on the first component, we thus obtain the function f 
we are looking for (associating an element of B to each element of A), and we 
can use the second component to prove that it satisfies the required property. 

However, what people have in mind when thinking of the axiom of choice: 
they rather have a “classical” variant where we do not have access to the proof 
of (6.1), but we only know its existence. A more reasonable description is thus 
the following formalization of the axiom of choice, where the double negation 
has killed the contents of the proof, see section 2.5.9: 


postulate CAC : {A B : Set} (R : Rel A B) > 
73 (x : A) 7XUBAyrRxX y)) > 
737 (A 7B) AfA2Vx FRx Cf x)) 


This is discussed in further details in section 9.3.4. 


6.5.9 Predicates. Predicates can be expressed in Agda; we will discuss this 
now. 


Truth values. In classical logic, the set B of booleans is the set of truth values, 
i.e. the values in which we evaluate predicates: a predicate on a set A can either 
be false or true, and can thus be modeled as a function A > B. In Agda, we 
use intuitionistic logic and therefore we are not so much interested in whether a 
predicate is true or not, but rather in its proofs, so that the role of truth values 
is now played by Set. A predicate P on a type A can thus be seen as a term of 


type 


A > Set 


which to every element x of A associates the type of proofs of P x. 
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Relations. In classical mathematics, a relation R on a set A is a subset of Ax A, 
see also appendix A.1. An element x of A is said to be in relation with an 
element y when (x,y) € R. A relation on A can also be encoded as a function 


AxA->B 


or, equivalently by currying, as a function 


Av~A-> 


In this representation, x is in relation with y when R(x, y) = 1. 

In intuitionistic type theory, we can describe the type of relations between 
two types A and B as the following type Rel A, obtained by replacing the set 
of truth values with Set in the above description: 


Rel : Set > Set, 
Rel A=A7A@ Set 


(this definition can be found in Relation.Binary in the standard library). For 
instance, the usual order relation _<_ on natural numbers (see below) can be 


given the type Rel N, and equality relation _=_ on a type A (see section 6.6) can 
be given the type Rel A. 


Inductive predicates. In Agda, we can define types inductively, and these types 
can depend on other types (inductive types can have parameters and indices). 
This means that we can define predicates by induction! For instance, the pred- 
icate on natural numbers of being even can be defined by induction by 


data isEven : N ~ Set where 
even-z : isEven zero 
even-s : {n : N} + isEven n ~ isEven (suc (suc n)) 


We inductively state that 0 is even, and that if n is even then n+ 2 is even. In 
other words, this corresponds to the definition of the set E C N of even numbers 
as the smallest set of numbers such that 0€ EF andne E>n+2€ E. 

Similarly, the order relation on natural numbers can be defined with the 
following inductive type: 


data _<_ : N7> N-~ Set where 
zsn : {n : N} 7 zero <n 
sss : {mn : N} (msn : m <n) + suc m ¥§ suc n 


This states that it is the smallest relation on natural numbers such that 0 < 0, 
and m < nimplies m+1<n+1. One of the main interest of defining predicates 
or relations inductively is of course that we can then reason by induction over 
those. For instance, we can show that the order relation is reflexive 


S-refl : {n : N} 7 (n ¢ n) 
S-refl {zero} = zn 
<-refl {suc n} = sss <-refl 


and transitive 


CHAPTER 6. AGDA 295 


$-trans : {mnp: N} > (m <n) > (n ¢ p) > (m § p) 
s$-trans z&n nsgp = zn 
S$-trans (sss m&n) (sSs ngp) = sss (<-trans m&n np) 


Because of the support in Agda for reasoning by induction (and dependent 
pattern matching), this is often the best choice of style for defining predicates, 
leading to the simplest proofs, although there are many other possibilities. In 
order to illustrate this, the order on natural numbers could have been defined 


which is base on the classical equivalence, for m,n € N, 


m<neIdm eNm+m =n 
We could also have defined it by 


le : N+ N= Bool 

le zero n = true 

le (suc m) zero = false 

le (suc m) (suc n) = lemn 


_< : NAN 2 Set 
m<n=lemn = true 


We leave as an exercise to the reader to show reflexivity and transitivity with 
those formalizations. 

Finally, as a more involved example, the implicational fragment of intuition- 
istic natural deduction is formalized in section 7.2: here, the relation [T+ A 
between a context I and a type A, which is true when the sequent is provable, 
is defined inductively. 


6.6 Equality 


Even equality is defined as an inductive type in Agda. The definition is given 
in Relation.Binary.PropositionalEquality by 


data _=_ {A : Set} (x : A) : A > Set where 
refl : x =x 


The equality is typed, in the sense that we can compare only elements of the 
same type A. Moreover, there is only one way to show that two elements are 
equal: it is when they are the same! Because of dependent pattern matching, 
we will see that it is not as dumb as it might seem at first. 


6.6.1 Equality and pattern matching. As a first proof with equality, let 
us show that the successor function on natural numbers is injective. In other 
words, for every natural numbers m and n, we have 


mtl=n+1l5>m=n 


This can be formalized as follows: 
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suc-injective : {mn : N} + suc m= sucn7+men 
suc-injective refl = refl 


In order to understand how such a proof works, let us study this proof step by 
step and reveal the implicit arguments m and n. We start with 


suc-injective : {mn : N} + suc m= sucn7+me=n 
suc-injective {m} {n} p = ? 


By pattern matching on p (using the shortcut C-c C-c), the proof is transformed 
into 


suc-injective : {mn : N} + suc m= sucn7+men 
suc-injective {m} {.m} refl = ? 


In order to do this, Agda uses the fact that p can only be the constructor ref], 
but it also knows that, in this case, the variable m must be equal to n. This 
explains the .m for the second optional argument: it means that it is not really 
an argument but something which has to be equal to m. We are thus left prov- 
ing m = m, and we can conclude by using refl. Most proofs involving equality 
are either performed in this way or by using the main properties of equality 
shown in next section. 


6.6.2 Main properties. Apart from reflexivity, which is ensured by the con- 
structor refl, equality can be shown to be a congruence: it is symmetric, tran- 
sitive and compatible with every operation. 


sym : {A : Set} {x y: A} 7x =y7y=x 
sym refl = refl 
trans : {A : Set} {x yz: A} 7 x2=yr7y2z7x=z 


trans refl refl = refl 
cong : {AB : Set} (f : A7B) {x y: A} r7x=yrfxefy 
cong f refl = refl 


Two other important operations on equality are substitutivity which allows to 
transport the elements of a type along an equality 


subst : {A : Set} (P : A 7 Set) 7 {x y: A} 7x =y7Px7Py 
subst P refl p = p 


and coercion which allow to convert an element of a type to another equal type 


coe : {AB: Set} 7#A=B7AAW7B 
coe p x = subst (A A> A) p x 


The properties of equality will be discussed again in section 9.1. 


6.6.3 Half of even numbers. As an application of the above properties, let 
us formalize the fact that every even number has a half, following the proof 
strategy presented in section 2.3. In traditional logical notation, we have to 
show 


Yn € N.isEven(n) = Ime Nm+m=n 
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The predicate isEven, which indicates whether a natural number is even or not, 
was already defined in section 6.5.9 and we can thus formalize our property as 
follows 


even-half : {n : N} 7 isEvenn7><XN (Am>m+m=n) 
even-half even-z = zero , refl 
even-half (even-s e) with even-half e 
even-half (even-s e) |m, p= 
suc m , cong suc (trans (+-suc mm) (cong suc p)) 


In the second case, we have by induction a number m such that m+m =n, 
and we want to construct a half for n+ 2: this half will be m+ 1, and we can 
show that it is a half using the following reasoning 


(m+1)+(m+4+1) =(m+(m41))+1 by definition of addition, 


=((m+m)+1)4+1 by the lemma below, 
=(n+1)4+1 since m+m=n. 


This can be implemented using the transitivity of equality (trans) as well as 
the fact that it is a congruence (we use cong suc to deduce m+1=n+1 
from m =n). We also use, as an auxiliary lemma, the fact that 


m+(n+1)=(m+n)+1 
which can be shown by induction on m as follows: 


+-suc : (mn: N) *# m+ suc n = suc (m+ n) 
+-suc zero n = refl 
+-suc (suc m) n = cong suc (+-suc mn) 


6.6.4 Reasoning. The above handling of equality can be hard to track or 
read. A more natural way of presenting proofs can be achieved by using the 
=-Reasoning module, which displays equality in way closer to the usual one in 
mathematics. These helper functions can be accessed with 


open =-Reasoning 


Then, one can write a proof of tg = t, in the form 


begin to =( P, ) ty =( P> ~e.76 36 Ph ) th | 


where P; is a proof of ty_, = t;. For instance, a proof of the commutativity 
of addition over natural numbers using this technique is 


+-comm : (mn :N) A7m+n=n+M™ 
+-comm m zero = +-zero m 
+-comm m (suc n) = 
begin 
(m + suc n) ={ t+-suc mn ) 
suc (m + n) =( cong suc (+-comm mn) ) 
suc (n + m) @ 
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The second case directly mimics the usual mathematical proof 


mt+(n+1)=(m+n)4+1 by +-suc, 
=(n+m)+1 by induction hypothesis. 


For comparison, a direct proof of this fact, using the properties of equality of 
section 6.6.2, would have been 


+-comm: (mn :N) A7m+n=nt+Mm 
+-comm m zero = +-zero m 
+-comm m (Suc n) = trans (+-suc mn) (cong suc (+-comm m n)) 


As usual in Agda, these notations are not built-in but defined in the standard 
library by 


begin. : {A : Set} {x y: A} 7 x2=y7xezy 
begin_ x=y = x=y 
_=(_)_ : {A :: Set} (x {fy z} : A 7x Fyr7yFzr7x=ez 


_ =( x=y ) y=z = trans x=y y=z 


_m: {A : Set} (x : A) 7x =x 
_ _ = refl 


6.6.5 Definitional equality. In Agda, two terms which are convertible (i.e. re- 
duce to a common term) are considered to be “equal”. The equality we are 
referring to here is not =, but the equality which is internal to Agda, sometimes 
referred to as definitional equality: one cannot distinguish between two defini- 
tionally equal terms. For instance, over natural numbers, the term zero + n is 
definitionally equal to n, because this is the way we defined addition. Of course, 
definitional equality implies equality by ref1: 


+-zero' : (n : N) 7 zero+n=n 
+-zero' n = refl 


On the other side, the terms n + zero and n are not definitionally equal (there 
is nothing in the definition of addition which immediately allows to conclude 
that). The equality between these two terms can of course be proved, but 
requires some more work: 


+-zero : (n : N) #7 n+ zero=n 
+-zero zero = refl 
+-zero (suc n) = cong suc (+-zero n) 


Because of this, subtle variations in the definitions, even though they axiomatize 
isomorphic structures, can have a large impact on the length of the proofs, and 
one should take care of choosing the “best definition” for a concept, which 
requires some practice. For instance, for properties involving multiple natural 
numbers, the choice of the one on which we perform the induction can drastically 
change the size of the proof. 
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6.6.6 More properties with equality. Having introduced the notion of equal- 
ity, we show here some more examples of properties involving it, for natural 
numbers and lists. 


Natural numbers. We can show that zero is not the successor of any natural 
number (which is one of the axioms of Presburger and Peano arithmetic, see 
section 5.2.4), by a direct use of pattern matching: 


zero-suc : {n : N} 7 zero = suc n> 1 
zero-suc () 


Namely, when matching on the argument of type zero = suc n, Agda knows 
that there can be no proof of such a type because zero and suc n do not begin 
with the same constructor. We can thus use the empty pattern () to indicate 
that the pattern matching contains no cases to handle. This behavior is detailed 
in section 8.4.5. 

We can show that addition is associative by a simple induction: 


+-assoc : (mno: N) 7 (m+n) +02m+ (n+ 0) 
+-assoc zero no = refl 
+-assoc (suc m) n o = cong suc (+-assoc mn 0) 


Showing that multiplication is associative follows the same pattern, but requires 
some algebraic reasoning 


x-assoc : (mno: N) 7 (m* n) * 0 =m & (n * 0) 


*x-assoc zero no = refl 

*-assoc (suc m) n o = begin 
(m* n+n)*o =( x-+-dist-r (m * n) no ) 
m*xenxo+nxo =( cong (Am+m+n * 0) (*-assoc mn 0) ) 


m*x (n*o) +nxoO8 
where we use the fact that multiplication distributes over the addition: 


x-t+-dist-r : (mno:N)7 (m+n) *o=mxo+nxo 


*x-+-dist-r zero no = refl 
*-+-dist-r (suc m) n o = begin 
(m+n) *ot+o0 =( cong (A n +n +0) (*-+-dist-r mn 0) ) 


(m * 0 +n * 0) + Oo =( +-assoc (m * 0) (nN * 0) oO ) 

mx o+ (n*o+ 0) =( cong (A n7>m*o+n) (+-comm (n * 0) 0) ) 
mx ot (o+n * 0) =( sym (t+-assoc (m * 0) o (n * 0)) ) 
mxototnx*o | 


Lists. Concatenation of lists satisfies similar properties to addition of natural 
numbers (after all, the type N of natural numbers is isomorphic to the type List 
T of lists whose elements are all tt). Namely, we can show that the empty list 
is a neutral element for concatenation, on the left 


++-empty' : {A : Set} + (1: List A) 7 [] ++ 121 
++-empty' 1 = refl 


and on the right 
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++-empty : {A : Set} + (1 : List A) 7 1 ++ [] =1 
++-empty [] = refl 
++-empty (x :: 1) = cong (A 17x : 1) (+t-empty 1) 


and that concatenation is associative 


++-assoc : {A : Set} > (1 1' 1'' : List A) > 
(C1 ++ 1') +4 :1'') = C1 +t C1' +4+:1'')) 
++-assoc [] 1l' 1'' = refl 
++-assoc (x :: 1) 1' 1'' = cong (A 17x: 1) (++-assoc 1 1' 1'') 


However, contrarily to addition, concatenation is not commutative. To wit, if it 
was then the concatenation of the lists [1] and [2] in both orders would be the 
same, which would mean that the lists [1, 2] and [2,1] would be the same, which 
we know they are not. This reasoning can be formalized in Agda as follows: 


++-not-comm : 

- ({A : Set} + (1 1' : List A) 7 (1 ++ 1') = (1' ++ 1)) 
++-not-comm f with f (1 :: []) (2 : []) 
++-not-comm f | () 


We can also show that the concatenation of two lists produces a list whose 
length is the sum of the lengths of the original lists: 


++-length : {A : Set} + (1 1' : List A) 7 

length (1 ++ 1') = length 1 + length 1' 
++-length [] 1l' = refl 
++-length (x :: 1) 1' = cong suc (++-length 1 1') 


Finally, let us present an all-time classic. We can define a function rev which 
reverses the order of the elements of a list: we show that applying this function 
twice to a list gets us back to the original list. We begin by introducing a 
function snoc (this is cons backwards) which adds an element at the end of a 
list: 
snoc : {A : Set} * List A+A7 List A 


snoc [] x =x [J 
snoc (y :: 1) x = y :: (snoc 1 x) 


We can then define the reversion function by adding all the elements of a list at 
the end of the empty list: 


rev : {A : Set} + List A 7 List A 
rev [] = [] 


rev (x :: 1) = snoc (rev 1) x 


We can then show that applying this function twice does not change the list 
given in the argument: 


i] 
hb 


rev-rev : {A : Set} 7 (1 : List A) 7 rev (rev 1) = 
rev-rev [] = refl 
rev-rev (x : 1) = 

trans (rev-snoc (rev 1) x) (cong (A 17x: 1) Crev-rev 1)) 
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This proof requires to first show the following auxiliary lemma, stating that 
reversing a list | with x as last element will produce a list with x as first element, 
followed by the reversal of the rest of the list: 


rev-snoc : {A : Set} 7 (1 : List A) > (x : A) 7 
rev (snoc 1 x) = x :: (rev 1) 
rev-snoc [] x = refl 
rev-snoc (y : 1) x = cong (A 17 snoc 1 y) (rev-snoc 1 x) 


6.6.7 The J rule. If we define equality as 


data _=_ {A : Set} : A 7A > Set where 
refl : {x : A} 7x =x 


the associated induction principle is called the J rule: 


J: {A : Set} {x y : A} (p: x = y) 
(P: («x y:A)7xX = y > Set) 
Cr : (x : A) 7 P x x refl) 
>7Pxyp 

J {A} {x} {.x} refl Pr=rx 


It reads as follows: in order to prove a property P depending on a proof p of 
equality between two elements x and y, it is enough to prove it when this proof 
is refl. 

In practice, we have seen in section 6.6 that the definition usually taken in 
Agda is slightly different (it uses an parameter instead of an index for the first 
argument of type A), so that the resulting induction principle is a variant on the 
above one: 


J: {A : Set} (x : A) (P: (Cy: A) 7X =Zy > Set) 


(r : P x refl) (y: A) (Pp: x =y)7Pyop 
JxPr.xrefle=r 


6.6.8 Decidable equality. Recall from section 6.5.6 that a type A is decidable 
when either A or =A is provable, and we write Dec A for the type of proofs of 
decidability of A: such a proof is either yes p, where p is a proof of A, or no 
q, where q is a proof of =A. A relation on a type A is decidable when the type 
R x y is decidable for every elements x and y of type A. The standard library 
defines, in the module Relation.Binary, the following predicate: 


Decidable : {A : Set} (R: A7A-@ Set) > Set 
Decidable {A} R = (x y : A) > Dec (R x y) 


A term of type Decidable R is a proof that the relation R is decidable. 

A type A has decidable equality when the equality relation _=_ on A is 
decidable. This means that we have a function (i.e. an algorithm) which is able 
to determine, given two elements of A, whether they are equal or not. To be 
precise, we not only have the information of whether they are equal or not, 
which would be a boolean, but actually a proof of their equality or a proof of 
their inequality (see section 8.4.5 for a use of this). 

Equality on any finite type is always decidable. For instance, in the case of 
booleans: 
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: Decidable {A = Bool} _=_ 


false + false = yes refl 
false + true = no (A ()) 
true + false = no (A QO) 
true 2+ true = yes refl 


The type of natural numbers also has decidable equality: 


2: Decidable {A = N} _=_ 
? 


zero = zero = yes refl 

zero + suc n= no (A QO) 

suc m = zero. = no (A QO) 

suc m2 suc n with m 2n 

suc m = suc .m | yes refl = yes refl 

suc m + suc n | no 3p =no (A p> ap (suc-injective p)) 


However, we do not expect that the equality is decidable on the type N 7 N. 
One reason is that our reasoning techniques are very limited on functions, in 
particular we cannot perform pattern matching on functions, and thus cannot 
perform a proof in the same spirit as above. The other reason is that, intuitively, 
two functions f and g on natural numbers are equal when they f(n) = g(n) for 
every natural number n (this is not always true though, see section 9.1.5), and 
we do not expect that there is an algorithm which will be able to compare all 
the images of f and g on every natural number n in finite time... 


6.6.9 Heterogeneous equality. We would finally like to present a variant of 
equality due to McBride [McBO00], called heterogeneous equality, which allows to 
compare (seemingly) distinct types. In order to understand its use, let us try to 
show that the concatenation of vectors is associative. Given three vectors 1, 1’ 
and 1”, with elements of type A, of respective lengths m, n and o, we thus want 
to show 


(1 ++ 1’) ++ 1” = 1 ++ (1’ +4+:1”) 


..except that this expression does not make sense! Namely, the equality = can 
only be used to compare terms of the same type and, here, this is not the case: 
the left and right sides respectively have types 


Vec A ((m + n) + 0) Vec A (m + (n + 0)) 


which are not the same. Of course, the two types are propositionally equal: we 
can prove 


Vec A ((m + n) + 0) = Vec A (m+ (n + 0)) 
by 
cong (Vec A) (+-assoc mn 0) 


But the two types are not definitionally equal, which is what is required in order 
to compare terms with =. 


CHAPTER 6. AGDA 303 


Proof with standard equality. In order to perform our comparison, we can use 
coe and the above propositional equality in order to cast one of the members 
to have the same type as the other one. Namely, the term 


coe (cong (Vec A) (+-assoc mn 0o)) 
has type 
Vec A ((m + n) + 0) ~ Vec A (m + (n + 0)) 


and we can use it to “cast” (1 ++ 1’) ++ 1” in order to change its type to the 
same one as 1 ++ (1’ ++ 1”), after which we can compare the two with =. We 
can finally prove associativity of concatenation of vectors as follows: 


++-assoc : {A : Set} {mno: N}o 
(1 : Vec Am) + (1' : Vec An) > (1'' : Vec Ao) 7 
coe (cong (Vec A) (+-assoc mn 0o)) 
(C1 ++ 1') +4+:1'') = 1 ++ C1' ++ 1'') 
++-assoc [] 1' 1'' = refl 
++-assoc {_} {suc m} {n} {fo} (x: 1) l' l'' = 
s-cong x (+-assoc mn o) (++-assoc 1 1' 1'') 


The above proof uses the following auxiliary lemma which states that if 1 and 
1’ are propositionally equal vectors, up to propositional equality of their types 
as above, then x :: 1 and x :: 1’ are also propositionally equal: 


s-cong : {A : Set} + {mn : N} {1 : Vec A m} {l1' : Vec An} > 
(x : A) 7 (p : m =n) 7 coe (cong (Vec A) p) 1=l' - 
coe (cong (Vec A) (cong suc p)) (x: 1) =x: 1’ 

s-cong x refl refl = refl 


As you can observe, the statement of those properties is considerably obscured 
by the use of coe, which is used to coerce the type of terms so that they can be 
compared to other terms, as explained above. 


Proof with heterogeneous equality. In order to overcome this problem, we can use 
the heterogeneous equality relation, also sometimes called John Major equality, 
which is defined by 


data _= : {A B: Set} (x : A) (y : B) > Set where 
refl : {A : Set} {x : A} 7x =x 

in the module Relation.Binary.HeterogeneousEquality. It is a variant of 
propositional equality, which allows comparing two elements x and y of dis- 
tinct types A and B. It is however a reasonable notion of equality because the 
constructor refl only allows to construct an heterogeneous equality when A 
and B are the same. This ability of comparing elements of distinct types allows 
formulating and proving the associativity of vectors in a much easier way: 


++-assoc : {A : Set} {mno : N} 


(1 : Vec Am) (1' : Vec An) (1'' : Vec Ao) 7 
(C1 ++ 1') +4 :1'') = 1 ++ C1' +4 1'')) 
++-assoc [] 1' 1'' = refl 


++-assoc {_} {suc m} {n} {fo} (x: 1) l' l'' = 
n-cong x (+-assoc mn o) (++-assoc 1 1' 1'') 
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with the preliminary lemma 


s-cong : {A : Set} {mn : N} {1 : Vec A m} {1' : Vec A n} 7 
(x: A7menr7l121'axX2 Lex: 1' 
s-cong x refl refl = refl 


The reader should be warned that heterogeneous equality is not entirely sat- 
isfactory. Firstly, if x and y are two elements of the same type A, we cannot 
formally show that x = y implies x = y (unless we assume axiom K, see sec- 
tion 9.1.6). Secondly, being able to compare elements of any two types A and B 
seems quite worrying, the only thing we really need here is to compare elements 
of such types when A = B: a more satisfactory definition is given in section 9.5.2. 


6.7 Proving programs in practice 


We shall now briefly explain and illustrate how we can prove a program is 
correct. Of course, there is no universally accepted notion of what we mean by 
the correctness of a program: it only means that it agrees with a specification, 
which can usually be expressed as a logical formula, and whose definition is 
left to the person certifying the program. We can however classify correctness 
properties in three rough families. 


— Absence of errors: the program always uses functions with arguments 
in the domain where functions are supposed to operate correctly. For 
instance, we want to avoid dividing by zero or dereferencing null pointers. 


— Invariants: we show that some properties are always satisfied during the 
execution of the program, e.g. the variable x will always contain a strictly 
positive number. 


— Functional properties: the program computes the expected output on any 
given input, e.g. given a natural number n, the function square n will 


produce a natural number m such that m = n?. 


We have ordered them from the less to the most precise: the first kind only 
ensures that basic programming errors are avoided, the second one that our 
program satisfies some good properties, and the last one that it fully behaves 
as expected. Of course, these are not disjoint: absence of errors is a particular 
kind of invariant, and proving functional properties usually requires showing 
invariants first. 


6.7.1 Extrinsic vs intrinsic proofs. There are two common approaches in 
order to prove properties of programs. The extrinsic approach is the way one 
traditionally proceeds [CLRS09]: we first write our program and then we prove 
properties about it. The intrinsic approach is often more adapted to proving 
programs in dependent type theory: it consists in changing the type of our pro- 
gram, so that this type incorporates the properties we want to prove (remember 
that we can see any reasonable formula as a type!). For instance, suppose that 
we want to show that our sorting algorithm sorts lists of natural numbers. 


— In the extrinsic approach we implement our algorithm as a function sort 
whose type is 
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List N > List N 


as usual and then show that this function actually sorts a list, i.e. prove 
the proposition 


(1 : List N) + sorted (sort 1) 


where sorted 1 is a predicate stating that a list 1 is sorted. 


— In the intrinsic approach, we directly implement our algorithm as a func- 
tion sort of type 


List N + SortedList 


where SortedList is the type of sorted lists of natural numbers. 


This example is detailed in section 6.7.2 for the insertion sort algorithm. The 
intrinsic approach usually results in shorter code, and is not significantly harder 
than the extrinsic one, although it usually requires more thought in order to 
formulate the property we want to prove in a way which will give rise to an 
elegant proof. 


Length and concatenation. As a simple example, suppose that we want to show 
that the the length of the concatenation of two lists is the sum of their lengths. In 
the extrinsic approach, we would define concatenation as usual, see section 6.4.5: 


++_ : {A : Set} > List A > List A 7 List A 


[) ltl! 
(x : 1) ++ 1' xed Chee cL) 


and then shows that it is additive with respect to lengths, see section 6.6.6: 


++-length : {A : Set} + (1 1' : List A) 7 

length (1 ++ 1') = length 1 + length 1' 
++-length [] 1' = refl 
++-length (x : 1) 1' = cong suc (++-length 1 1') 


In the intrinsic approach, we would consider the type of lists of a given length, 
i.e. the type of vectors, and give the following type to the concatenation, see 
section 6.4.7: 


_t+_ : {mn : N} {A : Set} + Vec Am- Vec A n- Vec A (m + n) 
C] t+ 1 =1 
(xn 1) t+ 1l' =x (1 t+ 1') 


which both defines the concatenation and shows the property we were looking 
for at once. 
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6.7.2 Insertion sort. As a more involved example, we consider the insertion 
sort algorithm to sort a list. We recall that a list | = [x1,22,...,2n] is sorted 
when 71 < %2 <... < @. For simplicity, we consider here lists of natural 
numbers, which are compared using to the usual total order. Given a list 1, and 
an element x, we can insert the element x into /, in order to obtain a sorted list, 
by comparing x with the elements of | from left to right and inserting it before 
the first element which is greater than it. Formally, we can define a function 
insert(a,/) by recursion on | by 


insert (2, []) = [2] 


myul ifa< 
insert(z, y :: 1) = 4 2 cia: m 
ys: insert(2,l) otherwise. 
The insertion sort algorithm then proceeds, in order to sort a given list, by 
iteratively inserting all its elements in a list which is initially the empty list. We 
write sort(1) for the list obtained in this way: 


sort([]) = [] 
l 


:: 1) = insert(x, sort(/)) 
If you prefer OCaml code: 


let rec insert x = function 
| C1] -> Cx] 
| y::l -> 
if x <= y then x::y::1 
else y::(insert x 1) 


let rec sort = function 
| C] -> C1 


| x::l1 -> insert x (sort 1) 


In order to prove the functional correctness of our algorithm, we have to show 
that, given any list as input, the output is a sorted list. It can be shown that 


— the empty list is sorted, 
— given a sorted list 1 and any element x, the list insert(«,/) is sorted, 
from which we deduce, by induction on J, that the list sort(1) is sorted for any 


list 1, see [CLRSO9, section 2.1]. 


Extrinsic approach. The correctness of insertion sort using the extrinsic ap- 
proach is shown in figure 6.2. We can define the function insert, to insert an 
element in a list, and sort, to sort a list, by a direct translation of the above 
definitions (for simplicity, we only handle the case of lists of natural numbers). 
Note that the sorting function has the usual type 


sort : List N > List N 
In the definition of the insertion function, we use the predicate 


$?_ : (mn: N) > Dec (m ¢ n) 
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open import Data.Product 

open import Data.Unit hiding (_S_ ; _S?_) 
open import Relation.Nullary 

open import Data.Nat 

open import Data.Nat.Properties 

open import Data.List 


insert : (x : N) 7 (1: List N) 7 List N 
insert x [] = x: [] 
insert x (y : 1) with x <? y 
Xx 
Xx 


insert (Cy: 1) | yes ~=xnuyul 
insert (y: 1) | no _=y : insert x l 


sort : List N7 List N 
sort [] = [] 
sort (x :: 1) = insert x (sort 1) 


<x_ : (x :N) 7 (1: List N 9 Set 
x $* [] =T 
x $k (yu l=axsyxx sel 


<x-trans : {x y: N37 (x < y) 7 (1: List N 7 y s<* 14x <s* 1 
<x-trans xsy [] tt = tt 
$x-trans xsy (z :: 1) (yz , ysS*l) = S-trans xSy ySz , <*-trans xSy 1 ys<«l 


S$x-insert : {x y : N} 7s y) 7d: List N) 7 
x $* 1 4 x <* (insert y 1) 
S$x-insert xsy [] tt = xsy , tt 
S$x-insert {x} {y} xSy (z : 1) xs*zl with y <? z 
S$x-insert xSy (z : 1) xsx«zl | yes _ = xSy , xS«zl 
S$x-insert xSy (z : 1) (xSz , xS*l1) | no _ = xSz , Sx-insert xs<y 1 xs<«l 


sorted : (1 : List N) > Set 
sorted [] =T 
sorted (x :: 1) = x <* 1 x sorted 1 


insert-sorting : (x : N) 7 (1: List N) 7 sorted 1 + sorted (insert x 1) 
insert-sorting x [] s = tt, tt 
insert-sorting x (y : 1) (ys*l , sl) with x <? y 
insert-sorting x (y : 1) (ys*l , sl) | yes xsy = 
(xsy , (S*-trans xsy 1 ys*1)) , (ys*l , sl) 
insert-sorting x (y : 1) (ys*l , sl) | no xZy = 
(Sx-insert (£2 xfy) 1 ysx*l) , insert-sorting x 1 sl 


sorting : (1 : List N) 7 sorted (sort 1) 


sorting [] = tt 
sorting (x : 1) = insert-sorting x (sort 1) (sorting 1) 


Figure 6.2: Correctness of insertion sort (extrinsic version). 
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which shows that the order on natural numbers is decidable, which is proved 
similarly as for equality, see section 6.6.8. 

Since lists are defined by induction, and all the reasoning about those will 
be performed by induction, it is better to define the predicate of being sorted 
for a list by induction. In order to do so, we first define a relation <* between 
natural numbers and lists such that « <* 1 whenever x < y for every element y 
of I, i.e. the elements of | are bounded below by x. This is defined by induction 
on | by 


— x <* Ll always holds when I is the empty list, 
— a <* (y:: 1) whenever x < y and x <* 1. 


We can then define the predicate of being sorted for a list by induction on the 
list by 


— the empty list is always sorted, 
— a list x :: 1 is sorted whenever x <* | and I is sorted. 


Finally, using two easy lemmas involving the relation <*, we can show that 
given any number z and list | which is sorted, the list insert(z,1) is also sorted 
(this is insert-sorting), from which we can deduce that, for any list J, the list 
sort(l) is sorted (this is sorting). 


Intrinsic approach. We show the intrinsic approach to show correctness of in- 
sertion sort in figure 6.3. Here, we give to the sorting function the type 


sort : (1 : List N) 7 SortedList 


which directly specifies that it returns a sorted list. Here, SortedList is the type 
of sorted lists of natural numbers, which can be defined inductively as follows: 
a sorted list is 


— either empty, or 
— of the form x :: 1 where x <* 1 and 1 is a sorted list. 


There is a subtlety, however: we now want the relation z <* | to apply toa 
sorted list 1, so that it should be defined by mutual induction with the notion 
of sorted list. This kind of definition is called an inductive-inductive type, see 
section 8.4.3, and requires to declare the type of both SortedList and _<*_ 
beforehand. 

The insertion function basically takes an element and a sorted list and re- 
turns the sorted list resulting from the insertion of the element. However, in 
order to show that the result is a sorted list, we need to return a second element 
which states that this result 1’ satisfies a property akin to <*-insert in the 
extrinsic approach (you should try by yourself in order to understand why), and 
the type if the insertion function is 


insert : (x : N) (1 : SortedList) > 
x SortedList (A 1l' 4+ {y : N}>aysxw7y s* lovy se 1’) 
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open import Data.Product 

open import Data.Unit hiding (_<_ ; _<?_) 
open import Relation.Nullary 

open import Data.Nat 

open import Data.Nat.Properties 

open import Data.List 


data SortedList : Set 
data _<*_ : N > SortedList 7 Set 


data SortedList where 
empty : SortedList 
cons : (x : N) (1: SortedList) (le : x <* 1) + SortedList 


data _<*x_ where 
S$x-empty : {x : N} 7 x S* empty 
<x-cons : {x y : N} {1 : SortedList} + 
x $< y 7 (le: y <* 1) + x <* (cons y 1 le) 


$*x-trans : {x y : N} {1 : SortedList} 7 x < y7 y <* 14x <« 1 
S$x-trans xSy $*-empty = $x-empty 
$x-trans xSy (Sx*-cons ySz zS*l) = <x-cons (<-trans x<y ySz) z<x*l 


insert : (x : N) (1: SortedList) 7 
x SortedList (A l' + {y :N}rysxvry <*& lay <* l') 
insert x empty = 
cons x empty <*-empty , (A ySx _ 7 S*-cons y<x <x*-empty) 
insert x (cons y 1 ys<*l) with x <? y 
| yes xsy = 
cons x (cons y 1 ysl) (<*-cons x<y ys*l) , 
(A z&x zS*xyl + <*-cons zSx (<*-cons xsy ys*1l)) 
| no xy with insert x 1 
ae |} l' , p= 
(cons y l' (p (£2 xy) ys*1)) , 
(CA { z&x (S*-cons zSy _) + S*-cons zsy (p (£2 xZy) ys*l) })) 


sort : (1 : List N) 7 SortedList 


sort [] = empty 
sort (x :: 1) = proj; (insert x (sort 1)) 


Figure 6.3: Correctness of insertion sort (intrinsic version). 
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6.7.3 The importance of the specification. Once we have performed such 
a proof, does this guarantee that our function is correct? Well... yes and no! 
On the bright side, our rigorous proof does indeed guarantee that the sorting 
function will always return a sorted list, whichever list we provide to it as input. 
This is actually true for eternity. 

However, one might be surprised to find out the following function also has 
the same type: 


bad : (1 : List N) ~ SortedList 
bad 1 = empty 


This function always returns the empty list, whichever list is provided as input. 
This will not usually be considered as a valid sorting functions, although it fills 
the bill. The empty list is, after all, a sorted list. The culprit is not the proof 
assistant nor the proof here, but the specification itself: what we expect from a 
sorting function is not only to return a sorted list, but also that the returned 
list has the same elements as the one given as argument. 


Exercise 6.7.3.1. Show that the insertion sort function satisfies the strengthened 
specification. 


This kind of problem is not purely theoretical: in an earlier version of this 
book, the function given in figure 6.3 was actually wrong, and this remained 
unnoticed because the specification of sorted lists was also wrong... 


6.8 Termination 


6.8.1 Termination and consistency. In order to maintain consistency, Agda 
ensures that all the defined functions are terminating, by which we mean that 
they will always give a result after a finite amount of time. To understand why 
this is required, we can force it to accept a non-terminating function and this 
is what happens (spoiler: inconsistency). This can be achieved by using the 
pragma {-# TERMINATING #-} before the definition of a function, which means 
“trust me, this function is terminating”. For instance, the function f defined on 
natural numbers by f(n) = f(n +1) is clearly not terminating. It can be given 
the type N > 1, from which it is easy to make the system inconsistent: 


{-# TERMINATING #-} 
f: NL 
f n = f (suc n) 


absurd : L 
absurd = f zero 


1: 021 
1 = L-elim absurd 


Q 
Q 
Yes, we have managed to prove 0 = 1. If we do not use the pragma, Agda 


correctly detects that the function f is problematic and prevents us from defining 
it: 


Termination checking failed for the following functions: 
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f 
Problematic calls: 
f (suc n) 


6.8.2 Structural recursion. In order to ensure that the programs are termi- 
nating, Agda uses a “rough” criterion, which is simple to check and safe, in 
the sense that it ensures every accepted program is terminating. This criterion 
is that recursive programs must be structurally recursive, meaning that all the 
recursive calls must be done on strict subterms of the argument (we say that 
the argument is structurally decreasing). 

For instance, the following function computes the n-th term of the Fibonacci 
sequence, defined by fo = 0, fi =1 and fn4o = fn4it fn: 


fib : N7N 
fib zero = zero 
fib (suc zero) = suc zero 


fib (suc (suc n)) = fib n + fib (suc n) 


In the third case, the argument is suc (suc n), whose strict subterms are suc n 
and n, see section 5.1.2. Since the recursive calls are performed with those as 
arguments, the program is accepted. If we had instead used recursive calls of 
one of the following forms then the program would be rejected 


— fib (suc (suc n)): the argument suc (suc n) is a subterm of itself, but 
not a strict one, 


— fib (zero + n): the term zero + n is not a strict subterm of suc (suc 
n), ie. the first does not occur in the second; as you can see the no- 
tion of subterm has to be taken purely syntactically here, no reduction is 
performed (the fact that zero + n reduces to n is not taken in account). 


Multiple arguments. In the case where the function has two arguments (and 
this generalizes to multiple arguments), either the first argument must be struc- 
turally decreasing (in which case there is no restriction on the second one) or it 
should stay the same and the second argument must be structurally decreasing. 
Pairs of arguments are thus compared using the lexicographic order, see ap- 
pendix A.3.3. For instance, the following Ackermann function is also accepted: 


ack : (x y : N) -> N 

ack zero n = suc n 

ack (suc m) zero = ack m (suc zero) 

ack (suc m) (suc n) = ack m (ack (suc m) n) 


In the second case, the first argument m is a subterm of the first argument suc m 
of the function. In the third case, one recursive call is performed with m as 
first argument, which is a subterm of the first argument suc mand the second 
recursive call is performed with suc nas first argument, which stays unchanged, 
and nas second argument, which is a subterm of the second argument suc n. 
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Rejecting valid programs. The restriction to structurally recursive functions has 
the advantage to be simple, but the downside is that some programs which are 
not problematic are rejected by Agda, because they do not satisfy this criterion 
even though they are terminating. For instance, consider the following function 
which computes the quotient of two natural numbers: 


div: N+NON 

div mn with m <? suc n 

div mn | yes _ = zero 

div mn | no = suc (div (m + suc n) n) 


To be precise, div m n computes the quotient of mand n + 1, in order to avoid 
problematic divisions by zero. Even though it is terminating, this function is 
rejected. Namely, in the second case, m = suc n is not a strict subterm of m: 
Agda is not smart enough to notice that the recursive calls are performed with 
strictly decreasing values for m and must therefore be terminating. However, 
this does not mean that we cannot define division in Agda: there are other 
ways to formulate division, which are only slightly more complicated than the 
usual way shown above — and they get accepted, see section 6.8.7. 


6.8.3 A bit of computability. The criterion used by Agda to determine if a 
program is terminating is overly restrictive and it has to be so: it was shown 
by Turing that the halting problem, which consists in deciding whether a pro- 
gram terminates or not, is undecidable, i.e. there is no program which given a 
program as input determines whether it eventually terminates or not [Tur37]. 
However, we have indicated that for a given function, even if the straightfor- 
ward terminating implementation is abusively rejected by Agda, there is usually 
a way to reformulate it in order to obtain an implementation which is accepted. 
We show here that there are however some computable functions which cannot 
be implemented: the language Agda is not Turing complete. 


Computable functions. Given two sets A and B, a (partial) function f from A 
to B, associates to some elements x of A an image f(x) in B. We write dom(f) 
for the set of elements of A which have an image under f, called the domain 
of f, and say that the function f is total when dom(f) = A. We say that a 
function 

f:NON 


is implemented by an OCaml function 
f : int -> int 
when 


— for every natural number n € dom(f), the computation of f nm terminates 
and its result is f(n), 


— for every natural number n ¢ dom(f), the computation of f n does not 
return a result. 


Of course, we can define similarly the notion of an implementation of the func- 
tion f in any other programming language which canonically contains the nat- 
ural numbers, such as Agda, and we could extend the notion of computable 
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function to other data types than natural numbers. A function which can be 
implemented by some function in OCaml is said to be computable (of course, 
we could have chosen any other reasonable programming language in order to 
define computable functions, see section 3.3.6 for another possible definition). 

For instance, the function f : N > N such that dom(f) is the set of odd 
natural numbers and f(n) = n+ 1 for every n € dom(f) can be implemented 
in OCaml by 


let rec f n = if n mod 2 = @ then f n else n+ 1 
or by 
let rec f n = while n mod 2 = @ do () done; n + (n mod 2) 


and is thus computable. As illustrated above, there are generally multiple ways 
to implement a given function. 


Programming with total functions. The programming language Agda has one 
particularity compared to usual programming languages: since every function 
is terminating, all the functions which can be implemented are total. From this 
follows the following property. 


Theorem 6.8.3.1. In a programming language such as Agda in which all the func- 
tions which can be implemented are total, there is a total computable function 
which cannot be implemented. 


Proof. The idea is that the if all total computable functions were implementable 
in Agda then some partial function would also be implementable. Here is a de- 
tailed sketch of the proof. The functions f : N— N which can be implemented 
in Agda are described by a string, and are therefore countable: we can enu- 
merate those and write f; for the i-th implementable function. The function 
g:NxN-N such that g(i,n) = fi(n) is also implementable: given an argu- 
ment (i,n) the function g enumerates all strings in order and, for each string, 
tests whether it is a valid Agda definition of a function of type N > N, until it 
finds the i-th such function f;, at which points it returns the evaluation of f; 
on the argument n (this would require programming an evaluator of Agda func- 
tions in OCaml, which can be done). Suppose that g can be implemented in 
Agda (otherwise, we can conclude immediately). Then the function d: NN 
defined by d(n) = g(n,n) + 1 is clearly also implementable. Therefore, there is 
an index 7 such that d = f; and we have 


d(i) = g(i,t) +1 = f;(i) +1 =d(i)+1 


Contradiction. 


Functions which cannot be implemented are rare. In practice, all the usual func- 
tions that one manipulates can be implemented in Agda. An example of a 
function which cannot be implemented in Agda is an interpreter for the Agda 
language itself. 
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6.8.4 The number of bits. We have mentioned that the restriction to struc- 
turally recursive functions is quite strong and rejects perfectly terminating func- 
tions. Let us study an example and see the available workarounds. We consider 
the function bits which associates to every natural number n, the number of 
bits necessary to write it in base 2. For instance, 


bits(0) =0 bits(1) =1 ~~ bits(2)=2 ~ bits(3) =2  bits(4) =3 
This function is essentially a rounded base 2 logarithm: it can be expressed as 
bits(n) = 1+ |log,(n) | 


where, by convention, log,(0) = —1. This function satisfies the following equa- 
tions 


bits(0) = 0 bits(n + 1) = 1 + bits(|n/2]) 


which allows it to be computed recursively. In OCaml, it can thus be imple- 
mented as 


let rec bits n = 
if n = @ then @ else 1 + bits (n / 2) 


This function is terminating because the recursive call is done on a smaller 
natural number, thanks to the division by 2. In order to perform an analogous 
definition in Agda, we can define division by 2 with 


div2: NN 
div2 zero = zero 
div2 (suc zero) = zero 


div2 (suc (suc n)) = suc (div2 n) 
and then translate the above OCaml definition as 


bits : NAN 
bits zero = zero 
bits (suc n) = suc (bits (div2 (suc n))) 


This function is not accepted by Agda (without enforcing termination), because 
the recursive call of bits is performed on div2 (suc n), which is not a strict 
subterm of the argument suc n. 


6.8.5 The fuel technique. In order to define our function, a general technique 
consists in adding new arguments to it, so that the recursive calls are performed 
with one of these arguments being structurally decreasing. Typically, we can 
add as argument a natural number which will decrease at each call (when the 
function is called with suc n, the recursive call is performed with n), provided 
that we know in advance a bound on the number of recursive calls (i.e. we also 
have to add a proof that this argument will be non-zero so that we can decrease 
it). This is called the fuel technique because this natural number can be thought 
of as some fuel which we are consuming in order to perform the recursive calls. 

For instance, in order to define the bits function, we can add a natural 
number fuel as argument to the function, which is to be structurally decreasing: 
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bits-fuel : (n : N) 7 (fuel : N) 7 N 

bits-fuel zero f = zero 

bits-fuel (suc n) zero ? 

bits-fuel (suc n) (suc f) = suc (bits-fuel (div2 (suc n)) f) 


In the case the original argument n is 
— zero: we can return zero, 
— suc n: 


— if the fuel is of the form suc f, we can make a recursive call on 
div2 (suc n) with f as fuel: the fuel argument is structurally de- 
creasing (we have consumed one unit of fuel), 


— if the fuel is zero however, we do not know what to do (this is the ? 
above): we cannot perform a recursive call with structurally smaller 
fuel because we do not have fuel anymore (there is no strict subterm 
of zero). 


In order to overcome the problem encountered in the last case, we have to ensure 
that we never “run out of fuel”, i.e. that the fuel is strictly positive when we need 
to perform a recursive call. This can be achieved by adding a second additional 
argument which ensures an invariant on the fuel which will enforce this. For 
instance, we can add the requirement that the fuel is always greater than the 
original argument n. When performing a recursive call, we will have to show 
that this invariant is preserved: under the hypothesis n+1 < f +1, we have to 
show (n+ 1)/2 < f, which can be done as follows: 


(nt I/2<(ft/2<F 
We thus define 


bits-fuel : (n : N) + (fuel : N) 7 (n § fuel) 7 N 
bits-fuel zero f p = zero 
bits-fuel (suc n) zero O 
bits-fuel (suc n) (suc f) p = 
suc (bits-fuel (div2 (suc n)) f n+1/2<f) 
where 
n+1/2<f : div2 (suc n) < f 
n+1/2<f = begin 


div2 (suc n) <( <-div2 p ) 
div2 (suc f) <( §-div2-suc f ) 
f | 


This follows the same pattern as the previous definition, except that we know 
that the problematic case where the original argument is suc n and the fuel is 
zero will not happen: by the third argument, we would have suc n ¢ zero, 
which is impossible. The code is longer than above because, when performing 
the recursive call on div2 n with f as fuel, we have to provide a third argument, 
which shows that the invariant is preserved, i.e. div2 n < f holds. This is shown 
in the lemma named n+1/2sf, using two auxiliary lemmas 


$-suc : (n : N) 4 n€ suc n 
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and 
$-div2-suc : (n : N) 7 div2 (suc n) <n 


whose proof is left to the reader. We can finally define the bits function by 
providing, as fuel, a high enough number. For instance n is suitable: 


bits : NON 
bits n = bits-fuel nn <-refl 


6.8.6 Well-founded induction. We now present a generalization of this tech- 
nique called well-founded induction. The fundamental reason why the fuel tech- 
nique is working is that we are decreasing some natural number and there is 
no infinite strictly decreasing sequence of integers so that we know that the re- 
cursive calls will stop at some point. The technique presented here axiomatizes 
this situation and is also detailed in appendix A.3. 


Well-founded induction and recursion. In mathematics, a relation Rona set A 
is a subset of A x A. Given elements x and y of A such that (x,y) € R, we 
write « Ry, and think of x as being “smaller” than y. The relation R is well- 
founded when there is no infinite sequence (x;);cn of elements of A which is 
decreasing, i.e. such that x;4, Rx; for every index 1: 


... Rx Rx Rx, Rx0 


Example 6.8.6.1. On the set N of natural numbers the following two relations 
are well-founded: 


— the relation < such that n ~ n+ 1 for every n EN, 
— the usual strict order relation <. 


Example 6.8.6.2. If you are looking for counter-examples, the relation < on R 
or on Q is not well-founded, nor is the relation < on N. 


Given «x € A, we write 
|x={ye Aly Ra} 


for the set of predecessors of x. The following well-founded induction principle 
holds for well-founded relations, which generalizes the usual induction principle 
on natural numbers: 


Theorem 6.8.6.3 (Well-founded induction). Suppose given a set A, a well-founded 
relation R on A and a predicate P on the elements of A such that, for every 
x € A, if P holds on every element of {a then P holds on x. Then P holds for 
every element of A. 


Example 6.8.6.4. On N, the well-founded induction principle associated to ~< 
is the usual induction principle: given a predicate P such that P(0) holds, 
and P(n) implies P(n + 1) for every n € N, we have that P(n) holds for ev- 
ery NEN. 


Example 6.8.6.5. On N, the well-founded induction principle associated to < is 
the strong induction principle: given a predicate P such that, for every n € N, 
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if P(i) holds for every i < n then P(n) holds, we have that P(n) holds for 
every n € N. In formulas, this induction principle can be formulated as 


(Vn €EN(Vm € Nm <n => P(m)) > P(n)) = Yn € N.P(n) 


for any predicate P on natural numbers. 


Given a function f : A > B and A’ C A, we write f|4 : A’ > B for the 
function f restricted to A’. The following well-founded recursion principle can 
be shown, which generalizes the definition of a function by recursion: 

Theorem 6.8.6.6 (Well-founded recursion). Suppose given sets A and B, a well- 
founded relation R on A and function r which to every x € A and function 
{x — B associates an element of B. Then there is a unique function f: A— B 
such that, for every x € A, 


f(z) =r(@ fli) 


In the above theorem, we are defining a function f by recursion: the function r 
describes how to produce the value of f(x) from x and all the values of f(y) 
for y smaller than x, i.e. such that y Ra. Note that the type of the r is the 
dependent type (X(a#: A).(Ja > B)) > B. 


Example 6.8.6.7. Consider the well-founded relation <~ on N. We have 
o=0 \(n +1) = {n} 


for n € N. The associated well-founded recursion principle thus states that, 
given a number rg € N and a function r: Nx NN, there is a unique function 
f :N-N such that 


f(0) = 70 f(n+1) =r(n, f(r) 


for every n € N. If we use the following generic notation for the image of n 
under the function associated to ro and r, 


f(n) = rec(n, 10,7) 
we have 
rec(0, 70,7) = To rec(n + 1,7r9,r) = r(n, rec(n, r0,1)) 


which are precisely the rules for the usual recursor associated to natural numbers 
in A-calculus, see section 4.3.6. 


The well-founded subterm order. We now explain that the kind of recursion 
which is supported in Agda is a particular case of the one given in theo- 
rem 6.8.6.6. Suppose given a first order signature © and consider the set Jy of 
terms it generates, see section 5.1.2. Given a term t, we write |t| for its size, 
defined as the number of operators it contains: 


fi... t)l=1+ 7 ltl |z| = 0 


Given two terms s and t, we write s < t when s is a strict subterm of t. Note 
that s < t¢ implies |s| < |¢}. 
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Lemma 6.8.6.8. The relation < on terms is well-founded. 


Proof. Suppose that there is an infinite sequence of terms ¢; such that 
to >t) >te>... 
the we have an infinite strictly decreasing sequence of natural numbers 


|to| > [ta] > |t2] >... 


which is impossible because > is well-founded on N, see example 6.8.6.1. 


The recursion principle associated to this well-founded order is essentially the 
one used in Agda in order to define function: a function can be defined by 
recursion from the current value of the argument, as well as the image of strict 
subterms. 


Accessible elements. Suppose given a set A and a relation R on it, not supposed 
to be well-founded. We can define a subset of A, written Accr(A), which is 
the largest subset of A on which well-founded induction and recursion works, as 
follows. 

A subset B C A is R-closed when, for every x € A, |x C B implies x € B: 
if an element has all its predecessors in B then it is also in B. We define the 
set Accr(A) as the smallest R-closed subset of A (such a set exists since it can 
be obtained as the intersection of all R-closed subsets of A). An element of A 
is accessible with respect to R when it belongs to Accr(A). 


Theorem 6.8.6.9. A relation R on a set A is well-founded if and only if every 
element of A is accessible, i.e. A = Accr(A). 


In particular, given a relation R on a set A, the restriction of R to Accr(A) is 
always well-founded. 


Example 6.8.6.10. In N equipped with the relation ~ or <, every element is 
accessible. 

Example 6.8.6.11. On the set Z equipped with the relation <, no element is 
accessible. 

Example 6.8.6.12. On the set R equipped with the relation < such that x <~ x+1 
for every x in R, the set of positive reals, the set Acc.(R \ {—1}) is N. 


Well-foundedness in Agda. Although Agda only implements well-founded re- 
cursion on the subterm order natively, we will see that this is enough for most 
applications: we can encode general well-founded recursion (at least for most 
usual well-founded orders, some will always stay out of reach by arguments 
similar to those in section 6.8.3). 

Recall from section 6.5.9 that we can define a type Rel A of relations on a 
type A, which is A + A > Set. For instance, the strict order on natural numbers 
is a relation: 


<_: Rel N 


m<nz=sucmsn 
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In order to define the predicate of being well-founded for a relation, the di- 
rect definition of well-foundedness is difficult to implement in Agda, inelegant 
and difficult to use (mainly because it is defined using a negation). A much 
more satisfactory approach consists in taking the characterization given in the- 
orem 6.8.6.9 as a definition. 

Following the module Induction.WellFounded of the standard library, we 
define the predicate of being accessible as 


data Acc {A : Set} (_<_ : Rel A) (x : A) : Set where 
acc : ((y: A) 7 y < x > Ace _<_ y) 7 Acc _<_ x 


Given a relation R on a type A and an element x of type A, having a proof of 
Acc R x means that x is accessible with respect to R. Namely, the inductive 
definition states that the predicate Acc R is defined as the smallest one such 
that Acc R x whenever we have Acc R y for every predecessor y of x, which 
is precisely the definition of accessibility. Finally, we define a relation to be 
well-founded when every element is accessible with respect to it: 


WellFounded : {A : Set} + (_<_ : Rel A) 7 Set 
WellFounded {A} _<_ = (x : A) 7 Acc _<_ x 


Natural numbers are well-founded. As an instance of the above formalization, 
let us show that the strict order < on natural numbers is well-founded. It turns 
out that the usual definition of the order (see section 6.5.9) is not very well- 
suited for the induction we want to perform, and it proves simpler to use the 
following alternative definition of the order: 


data _<’_ (m: N) : N > Set where 


<’-refl : m <’ m 
<’-step : {n : N} 7m <’ nam’ sucn 


the associated strict order being defined by 


_<’_ : N2N 2 Set 
m <’ n= sucm Ss’ n 


We can then show that the relation <’ is well-founded as follows: 


<’-wellFounded : WellFounded _<’_ 
<’-wellFounded n = acc (lem n) 


where 
lem : (nm: N) 7m <’ n@ Acc _<’_ m 
lem (suc n) _ §’-refl = <’-wellFounded n 


lem (suc n) m (<’-step m<’n) = lem nm m<’n 


Of course, one is usually rather interested in the fact that the usual definition 
< of the strict order is well-founded. This can either be deduced from the fact 
that the relations < and <’ are equivalent, or it can be shown directly. Namely, 
the following lemma can be shown on the usual definition of the partial order 


-last : {mn :N} 7m n7m=num<n 
-last {n = zero} zn = inj, refl 
-last {n = suc n} zn = inj. (ss z&n) 


IA IA IA 
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-last (s§s mSn) with <-last msn 
-last (s$s mn) | inj; m=n = inj, (cong suc m=n) 
-last (s<s mSn) | inj2 m<n = injz (s<s m<n) 


IA IA IA 


from which the above proof can be adapted to show that < is well-founded: 


<-wellFounded : WellFounded _<_ 
<-wellFounded n = acc (lem n) 
where 
lem: (nm: N) 7 m<n- Acc _<_m 
lem (suc n) m_ m<n with <-last m<n 
lem (suc n) _ _ | inj, refl = <-wellFounded n 
lem (suc n) m_ | injz (sSs m<n) = lem n m m<n 


Well-founded definition of bits. As an application, we shall define our favorite 
bits functions by well-founded recursion on the order < on natural numbers. 
Given an argument n, in order to perform recursive calls on smaller arguments 
(with respect to the < order), we add an argument of type Acc _<_ n to the 
naive definition of bits, which is a proof that n is accessible, i.e. that all the 
elements strictly smaller than n are also accessible. Namely, an element of this 


type is of the form acc a with a of type 
(m : N) > m<n- Acc _<_m 


It is used here as a witness that the function is terminating. For instance, the 
function bits becomes 


bits-wf : (n : N) 7 Acc _<_n-7N 
bits-wf zero i = zero 
bits-wf (suc n) (acc a) 
suc 
(bits-wf 
(div2 (suc n)) 
(a (div2 (suc n)) (ss (S-div2-suc n)))) 


In order to perform the recursive call on div2 (suc n), we have to show that 
this number is accessible, which is deduced from the fact that div2 (suc n) < 
suc n holds, as explained above. Finally, the usual bits function, without the 
extra argument, is defined by providing the proof that every natural number is 
accessible as second argument (which is precisely the fact that the relation < is 
well-founded on natural numbers): 


bits : N7N 
bits n = bits-wf n (<-wellFounded n) 


Well-founded recursion without accessibility. In practice, it is quite annoying to 
require that the “average Agda user” should understand the definition of the 
accessibility predicate, so that the standard library defines the following function 
in the module Data.Nat. Induction, which expresses well-founded recursion in a 
way which does not require using arguments of type Acc. It is also nicer to read, 
since it precisely corresponds to the strong induction principle, as formulated 
in example 6.8.6.5: 
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<-rec : (P_: N 7 Set) 7 
((n : N) 7 (C(m: N) 7m<n->Pm) >~Pn) > 
(n: N)7Po 
<-rec P r n = lem n (<-wellFounded n) 
where 
lem : (n : N) 7 Acc _<_n7Pno 
lem n (acc a) = rn (A m mn > lem m (a m m<n)) 


In the end, this is all you will ever need to define functions by well-founded re- 
cursion on natural numbers in practice (and of course the same can be performed 
for any well-founded relation). 

For instance, the function computing the number of bits of a natural number 
n can be implemented by first adding to the naive implementation a new argu- 
ment, which is a proof that the function is already defined for strictly smaller 
arguments than the current natural number: this argument will have type 


(m:N) 7m<n-n 


and provides a function to compute the recursive calls on strictly smaller argu- 
ments. We thus obtain the following function: 


bits-rec : (n : N) 7 ((m: N) 7m<n-7N) on 
bits-rec zero r = zero 
bits-rec (suc n) r = suc (r (div2 (suc n)) (s&s (S-div2-suc n))) 


Finally, we can deduce an implementation of the expected bits function by using 
it as an argument of <-rec: 


bits : NaN 
bits = <-rec (A n 7 N) bits-rec 


6.8.7 Division and modulo. As another classic illustration of the above tech- 
niques, we shall implement euclidean division. It associates to each pair of natu- 
ral numbers m and n, with n > 0, a pair of natural numbers gq and r, respectively 
called the quotient and remainder of the division of m by n such that 


m=qxn+r and r<n (6.2) 


Numbers satisfying these properties can be shown to be unique, so that this 
is a proper specification for euclidean division. Their traditional notations are 
respectively 


q=m/n r=mmodn 


and they can be computed using the following classic algorithm, here imple- 
mented in OCaml: 


let rec euclid mn = 
if m <n then (@, m) else 
let (q, r) = euclid (m - n) nin 
(q+ 1, r) 
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Well-founded definition: external approach. The direct translation of the above 
algorithm is naturally performed by well-founded induction on m, and is justified 
by the fact that the recursive call is performed on m - n which is strictly smaller 
than m because we assume that n is strictly positive. The type of the division 
operation we want to define is 


(mn:N)70O<n-7Nxn 


taking mand n as arguments, as well as a proof of @ < n, and returning the pair 
(q , r) asresult. Since we perform the induction on the first argument, we are 
going to define, by well-founded recursion on m, a function of type 


(n:N70<n7>Nxn 


for every natural number m. In order not to have to type this every time, we 
define the notation Euclid m for this type: 


Euclid : N + Set 
Euclid m= (n:N)70<n-7NxN 


We can then implement euclidean division by well-founded recursion, following 
the above definition: when computing the result of the division of m and n, we 
first check whether m < n holds or not, and provide an answer appropriately, 
which requires performing a recursive call in the case m £< n (which requires 
additional code because we now have to provide a proof that m—n <n holds). 
The definition is 


div : (m: N) > Euclid m 
div m = <-rec Euclid rec m 
where 
rec : (m: N) ~ ((m' : N) 4 m' <m- Euclid m') ~ Euclid m 
rec m f n Q<n with m <? n 
rec m f n Q<n | yes m<n = zero , m 
rec m f n Q@<n | no mn with 
f (m ~ n) (m=n<m mn (<-trans! @<n (422 mén)) @<n) n Q<n 
rec m f n @<n | no mén | q,r=sucq,r 


and uses the following auxiliary lemma (in addition to those already present in 
the standard library): 


m-n<m : (mn:N)70<m70<nwam-+n<M 
m-n<m (suc m) (Suc n) _ _ = sss (m-n<m m n) 


Finally, it can be shown that this implementation is correct, in the sense that 
it satisfies the specification (6.2). Formally, we can show 


div-correct : 
(mn: N) (<n: @<n) 7 
m = proj; (div mn Q<n) * n+ (proj2 (div mn Q@<n)) x 
(proj2 (div mn @<n)) <n 


However, the proof is not obvious, due to the use of the well-founded induction 
and a more satisfactory approach is detailed in next section. 
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Well-founded definition: intrinsic approach. We now present the intrinsic ap- 
proach which, as explained in section 6.7.1, consists in enriching the type, so that 
the implementation is correct by definition (as opposed to being proved correct 
after being defined). We first define a type corresponding to the specification of 
euclidean division: 


Euclid : N + Set 
Euclid m= (n:N)70<n- 
XENQCAQqQrFIENArwrAme=qentrxr<n)) 


so that euclidean division will have type 
(m : N) + Euclid m 
and thus consist of a function which takes as arguments 
— a natural number nm, 
— a natural number n, 
— a proof of @ < n, 
and will return a dependent 4-uple consisting of 
— a natural number q (the quotient), 


— a natural number r (the remainder), 


a proofofm=qx*ntr, 
— a proof of r <n, 


which is a type theoretic description of the specification (6.2). The implemen- 
tation is very similar to the above one, except that we now have to return, 
in addition to the quotient and the remainder, the two proofs indicated above 
which show that those results are correct. The full code is 


div : (m: N) > Euclid m 
div m = <-rec Euclid rec m 
where 
rec : (m: N) ~ ((m' : N) 47 m' <m- Euclid m') ~ Euclid m 
rec m f n Q<n with m <? n 
rec m f n Q<n | yes m<n = zero , m, refl , m<n 
rec m f n Q@<n | no mn with 
f (m ~ n) (m=n<m mn (<-trans! @<n (422 mén)) @<n) n Q@<n 
rec m f n @<n | no mén | q,r,e,r<n=sucq,r =, lem, r<n 
where 
lem: m=n+q*entrer 
lem = begin 


m =( sym (m+[n-m]=n (¢>2 mén)) ) 
n+ (m = n) =( cong (Ax 7 n+x)e) 
n+(q*n+r) =( sym (+-assoc n (q * n) r) ) 
n+q*ent+r 


Instead of trying to read it, the reader is urged to try this by himself instead. 


CHAPTER 6. AGDA 324 


Inductive definition. For general culture, we shall also mention that it is also 
possible to implement euclidean division by structural definition, avoiding the 
use of well-founded induction. This is the approach followed in Agda’s standard 
library, in Data.Nat.DivMod. The trick consists in adding two extra arguments q 
and r’ to the naive function, which will keep track of the quotient and remainder 
(or, more precisely, n minus the remainder). Namely, given m and n, we will 
perform our definition by induction on m. Initially q is 0 and r’ is n, each time 
m is decreased by one, 


— if r’ is strictly positive, we decrease it by one, 
— if r’ is 0, we increase q by one and reset r’ to n. 
Formally, the code follows: 


euclid : (mnqr' :N) 7N*N 

euclid zero ngqr' =q,n-r' 

euclid (suc m) n q zero euclid mn (suc q) n 
euclid (suc m) nq (suc r') = euclidmngr' 


It can be shown that, for every m and n, the result of 
euclid mn zero n 


computes the quotient and remainder of m by suc n (we consider here suc n in 
order to ensure that the denominator is non-zero). 


Exercise 6.8.7.1. Show that this function is correct. 


Exercise 6.8.7.2. Give an intrinsic inductive definition of euclidean division. 


CHAPTER 7 


Formalization of important 
results 


In this chapter, we sketch the formalization in Agda of important concepts 
and results presented in this book: type safety (section 7.1), natural deduction 
(section 7.2), A-calculus (section 7.3), combinatory logic (section 7.4), simply- 
typed \-calculus (section 7.5). 


7.1 Safety of a simple language 


In section 1.4.3, we have studied a simple typed language consisting of expres- 
sions manipulating booleans and integers and have shown the two fundamental 
properties satisfied by this language: subject reduction and progress. We now 
explain how those can be formalized in Agda. We begin by importing the re- 
quired libraries, and renaming and hiding symbols so that we can redefine those 
used by the standard library: 


open import Data.Bool hiding (if_then_else_ ; _4. ; _<_ ; _<?_) 

open import Data.Nat renaming (_+_ to _tN_ ; _<?_ to _<?N_) 
hiding (_<_) 

open import Relation.Nullary 


The language. A value in the language is either a natural number or a boolean: 


data Value : Set where 
VNat : N > Value 
VBool : Bool ~ Value 


A program is either a value, an addition, a comparison, or a conditional branch- 
ing: 


data Prog : Set where 


V : Value + Prog 
hs : Prog + Prog + Prog 
oe : Prog + Prog + Prog 


if_then_else_ : Prog + Prog + Prog ~> Prog 


and we assign priorities to these constructors, in order to ease the writing of 
programs: 


infix 5@ _+_ 
infix 40 _<_ 
infix 30 if_then_else_ 


We will need to compare natural numbers with this function, which returns a 
boolean depending on whether the first is strictly smaller than the second or 
not: 
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_<?_ : N74 N- Bool 
m <? n with m <?N n 
(m <? n) | yes _ = true 


(m <? n) | no = false 


We then define the reduction relation as an inductive binary predicate _>_, so 
that, given programs p and q, a proof of p > q corresponds to a derivation of 
- p —>+ q using the rules of figure 1.1: we add one constructor to this inductive 
predicate for each inference rule. 


data _>_ : Prog > Prog ~ Set where 
»>-Add : (mn: WN) 
V (VNat m) + V (VNat n) > V (VNat (m +N n)) 
»>-Add-1 : {p p' : Prog} + p+>p' > (q : Prog) > 
poh qre.pe hog 
>-Add-r : {q q' : Prog} ~ (p : Prog) 7 q>q' > 
pt ge ptsg 
=-Lt >: (mn: N) > 
V (VNat m) < V (VNat n) > V (VBool (m <? n)) 
=-Lt-l : {p p' : Prog} + p+p' + (q : Prog) + 
p<q>?p'<q 
=-Lt-r : {q q' : Prog} + (p : Prog) 7 q>q' > 
p<q>p<q' 
>If : {p p' : Prog} +7 p>p' ~ (qr: Prog) > 
if p then q else r > if p' then gq else r 
>-If-t : (pq: Prog) > 
if V (VBool true) then p else q > p 
>-If-f : (pq: Prog) > 
if V (VBool false) then p else q > q 


Typing. We now define the typing system of our language, starting with the 
definition of a type which is either a natural number or a boolean: 


data Type : Set where 
TNat TBool : Type 


We then define the typing relation as an inductive binary predicate F_::_, so 
that a proof of F p :: A for a program p and type A corresponds precisely to a 
proof of  p: A using the type inference rules given in section 1.4.3: 


data F_::_ : Prog + Type + Set where 

F-Nat : (n: N) > 
F V (VNat n) :: TNat 

--Bool : (b : Bool) 7 
- V (VBool b) :: TBool 

F-Add : {p q : Prog} ++ p:: TNat 7 + q:: TNat 7 
Fk p+q: TNat 

F-Lt : {p q.: Prog} ++ p: TNat + + q:: TNat > 
Fk p < q:: TBool 

F-If : {pqr_: Prog} {A : Type} 
Fp: TBool >Fq:At% Fr: A 
+ if p then q else r: A 


tL 
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Type uniqueness. This formalization of typing as an inductive predicate has a 
very interesting byproduct: the dependent pattern matching algorithm knows, 
given the constructor of a program, the possible types this program can have 
(and conversely, given a type, the possible program constructors which will give 
rise to this type). Thanks to this, showing type uniqueness (theorem 1.4.3.1) is 
simple: 


tuniq : {p : Prog} {A A' : Type} #Fp:A7EFp: A' 7AZA' 


tuniq (F-Nat n) (F-Nat .n) = refl 
tuniq (F-Bool b) (F-Bool .b) = refl 
tuniq (F-Add t u) (F-Add t' u') = refl 
tunigq (F-Lt t u) (F-Lt t' u') = refl 


tuniq (F-If t uv) (F-If t' u' v') = tuniq v v' 


For instance, in the first case (the program is a natural number), Agda infers 
that its type is necessarily TNat and therefore A and A’ must be equal (to TNat). 


Subject reduction. The subject reduction theorem (theorem 1.4.3.2) states that 
if a program p reduces to p’ and p admits the type A, then p’ also admits the 
type A. The proof is most easily done by induction on the derivation of p —> p’: 


sred : {p p' : Prog} {A : Type} 7 (p> p') #Fp:zAZAFp': A 


sred (=-Add m n) (F-Add _ _) = F-Nat (m +N n) 

sred (=-Add-l rq) (F-Add t t') = +-Add (sred r t) t' 
sred (=-Add-r p r) (F-Add t t') = +-Add t (sred r t') 
sred (=-Lt mn) (F-Lt t t') = --Bool (m <? n) 

sred (=-Lt-l rq) (F-Lt t t') = F-Lt (sred r t) t' 
sred (=-Lt-r pr) (F-Lt t t') = F-Lt t (sred r t') 
sred (=-If pqr) (t-If t t; t2) = F-If (sred p t) t; t2 
sred (=-If-t pq) (F-If t t; t2) = t, 

sred (=-If-f pq) (F-If t t; t2) = to 


Progress. The last important property of our typed language is progress (theo- 
rem 1.4.3.3) which states that a typable program is either a value or reduces to 
some other program. Given a program p which admits a type A, the proof is 
performed on the derivation of F p: A: 


prgs : {p : Prog} {A : Type} 7*7Fp:A7 
x Value (A v7 p2=Vv) WZ Prog (A p' + p> p') 


prgs (F-Nat n) = inj; (VNat n , refl) 

prgs (t-Bool b) = inj, (VBool b , refl) 

prgs (F-Add t t') with prgs t 

prgs (F-Add t t') | inj; (v , e) with prgs t’' 

prgs (F-Add t t') | inj; (VNat m , refl) | inj, (VNat n , refl) = 
inj2 (V (VNat (m +N n)) , >-Add m n) 

prgs (F-Add t ()) | inj, (VNat m , refl) | inj; (VBool b , refl) 

prgs (F-Add () t') | inj, (VBool b , refl) | inj, (v' , refl) 

prgs (F-Add {p} {q} t t') | inj; (V, e) | inj (q' , r) = 
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inj2 (p + q' , >-Add-r p r) 

prgs (F-Add {p} {q} t t') | injz (p' , r) = 
inj2 ((p' + q) , »-Add-l r q) 

prgs (F-Lt t t') with prgs t 

prgs (F-Lt t t') | inj; (VNat m , refl) with prgs t' 

prgs (F-Lt t t') | inj, (VNat m , refl) | inj, (VNat n, refl) = 
inj2 (CV (VBool (m <? n))) , >-Lt mn) 

prgs (F-Lt t ()) | inj, (VNat m , refl) | inj, (VBool b , refl) 

pres (F-Lt {p} {q} tt") | injy (WNat.m, refl) | inj2 (q yr) = 
inj2 (V (VNat m) < q' , >=-Lt-r (V (VNat m)) r) 

prgs (F-Lt () t') | inj, (VBool b , refl) 

prgs (F-Lt {p} {q} t t') | inj2 (p' , r) = 
inj2 (p' <q, >Lt-l r q) 

prgs (F-If t t; t2) with prgs t 

prgs (F-If () t; t2) | inj, (VNat x , refl) 

prgs (F-If {_} {q} {r} t t; t2) | inj, (VBool false , refl) = 
inj. (r , =-If-f qr) 

prgs (F-If {_} {q} {r} t t; t2) | inj; (VBool true , refl) = 
inde (qs S=Lfi-tq ir) 

prgs (F-If {p} {q} {r} t ti tz) | inj2 (p' , pr) = 
inj2 (if p' then q else r , >-If pr qr) 


Exercise 7.1.0.1. Formalize type inference and show that 


— it is correct: if a type is inferred for a program then the program actually 
admits this type, 


— it is complete: if a program is typable then type inference will return a 
type. 


7.2 Natural deduction 


The proofs in natural deduction are presented in section 2.2, we now briefly 
present how those can be formalized in Agda. For conciseness, we only present 
here the implicative fragment. 


Formulas. A formula is either a variable (whose name is given by a natural 
number) or the implication of two formulas (see section 2.2.1): 


data Formula : Set where 
X =: N- Formula 
_>_ : Formula + Formula 7 Formula 
Contexts. Next, we formalize a context as being either the empty context € or 
apairT , A consisting of a context T and a formula A: 


data Context : Set where 
e€ : Context 
: (1 : Context) + (A : Formula) 7+ Context 


asp ae 
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We could also have formalized contexts as lists of formulas, but the above for- 
malization allows for a slightly more natural notation. We write ,, A for the 
concatenation of two contexts I and A: 


_,,- : Context 7+ Context 7+ Context 
| rere’ oe Oh 
r,, A,A=@,,4),A4 


Provable sequents. We can define the type IT F A of provable sequents as an 
inductive predicate, with one constructor corresponding to each inference rule: 


data _-_ : Context 7 Formula ~ Set where 
ax : V{TAT'J 42T,A,,T' FA 
27E.: V{T AB} 27TFEAZBATEAITEHB 


oI: V{PAB} 7-f,ARBATEA2B 


(the axiom rule and the elimination and introduction rules for implication). This 
formalization is not very convenient because the argument of the constructor 
ax uses concatenation “,,” which is a function and not a type constructor, and 
will prevent pattern matching from working: unlike a constructor, this function 
does not have the property that 


r,, A=Pr',, A' implies T=.’ and A=A’ 


In order to overcome this problem, we chose instead to formalize provable se- 
quents as 


data _-_ : Context ~ Formula ~ Set where 
ax : V{T AX} FT ,AFA 
wk: V{TAB}4TEBIT,AEB 
cE: V{TAB}ATRFASBITEFASTEB 
eal: V{TAB}IT,ARBITEA?B 


which consists in replacing the usual axiom rule 


PArea 


by the two rules 


TFB 


—— ——_____ (wk 
pace Tare” 


which give rise to an equivalent logical system. 


Admissible rules. This can be used to show that the usual rules are admissible. 
For instance, we can prove that the contraction rule 


T,A, A,’ B 
TAB 


(contr) 


is admissible (see section 2.2.7) by induction, both on the context I’ and on the 
proof of the premise, by 
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cont 
cont 
cont 
cont 
cont 
cont 
cont 
cont 
cont 


OV AT A BEA WE ST sk Wg POR BST SA ay TRB 


”7~-m ™ 


& 
C 
C 
C 
C 


ax 
(wk p) 
(=E p q) 
(>I p) 
T' , A) ax 
T' , A) (wk p) 
T' , A) @E pq) 
T' , A) @I p) 


ax 


=I 
ax 
wk 
>E 
>I 


(cont € p) (cont € q) 
(cont (€ , _) Pp) 


(cont [' p) 
(cont (T' , A) p) (cont (T' , A) q) 
(cont (T' , A, _) p) 


Similarly, admissibility of the cut rule 


is shown by 

cut : V{T A 

cut € p ax 

cut € p (wk q) 
cut € p (@E qr) 
cut € p @I q) 
cut ([' , A) p ax 

cut ([' , A) p (wk q) 
cut (F' , A) p GE qr) 


cut (F' , A) p (I q) 


TFA T,AI’FB 


p 


ax 
wk 
>E 
=I 


TP’rs 


BVP Sr PA er wy TOR BSP TES 


(cut € pq) (cut e pr) 
(cut (€ , _) p q) 


(cut T' p q) 
(cut (T' , A) pq) (cut (F' , A) pr) 
(cut (T' , A, _) p q) 


Exercise 7.2.0.1. Formalize the admissibility of the other rules presented in 
section 2.2.7. 


7.3 Pure )-calculus 


In this section, we present a formalization of A-calculus in Agda, using de Bruijn 
indices. 


7.3.1 Naive approach. We can first think of directly translating the definition 
of \-terms given in section 3.1. We suppose fixed an infinite set of variables (say, 
the strings), 


Var : 
Var = 


Set 
String 


and define the syntax of A-terms as 


data Tm : Set where 


var 


: Var 7 Tm 


: Tm 7 Tm > Tm 
: Var 7 Tm 7 Tm 


meaning that a term is either of the form var x (the variable x), or t - u (the 
application of t to u) or X x , t (the function which to x associates t). The 
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weird choice of symbols in the last case comes from the fact that the dot (.) 
and lambda (A) are reserved in Agda. 

We could proceed in this way, but one should remember that A-terms are 
not terms generated by the above syntax, but rather of the quotient under 
a-equivalence (section 3.1.3). This means that we will have to define this equiv- 
alence and show that all the constructions we are going to make are compatible 
with it. This is rather long and painful. 


Exercise 7.3.1.1. Try to properly define 6-reduction with this formalization. 


7.3.2 De Bruijn indices. In order to efficiently handle the a-conversion prob- 
lem, we are going to use de Bruijn indices for variables, as presented in sec- 
tion 3.6.2. We thus define terms as 


data Tm : Set where 
var : N 7 Tm 
—-_ : Tm 7 Tm > Tm 
A : Tm > Tm 


A term can thus be in one of the following forms: 

— var x: the x-th variable with x a natural number, 

— t - u: the application of a term t to a term u, 

— X t: the abstraction of the 0-th variable in t. 
Lifting. The next thing we want to do is define 6-reduction, but before being 
able to do this, we first need to introduce helper functions in order to explicitly 
manipulate variables, following section 3.6.2. 

The first one is lifting which can be thought of as creating a fresh variable 
numbered x. After performing this operation, all the variable indices y which 


are greater than x have to be increased by one in order to make room for 2. 
The new index of y after the creation of x is written +,,y and defined by 


y ify <a, 
Te ¥ = 
y+t1l ify>a. 
In Agda, this function can be defined by 


tT: N7ANON 


T zero y = suc y 
t (suc x) zero = zero 
t (suc x) (suc y) = suc (Tt x y) 


and we write tT x y for f, y. 

Conversely, the unlifting operation consists in removing an unused variable x. 
After the removal, all the variable indices y which are greater than x have to be 
decreased by one in order to fill in the “empty space” leaved by x. Their new 
index will thus be |, y, defined by 


y ify <a, 
te ¥ = 
y-1 ify>a. 
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The function is not defined when y = x, because we have supposed that the 
variable x is not used. In Agda, this can be defined as 


+: (&y:N) 7x #yN 


+ zero zero =p = L-elim (-p refl) 
+ zero (suc y) 7p =y 
+ (suc x) zero 3p = zero 


+ (suc x) (suc y) =p = suc (4 x y (A p > 7p (cong suc p))) 


and we write + x y p for |, y: in addition to x and y, the Agda function takes 
a third argument p which is a proof that is different from y. 

The above lifting operation can be extended to A-terms. Given a variable x 
and a A-term t, the term ¢,,¢ obtained after creating a fresh variable x will be 
written here wk x t, because it is thought of as some form for weakening for the 
term t. The weakening function wk is defined here by 


wk : N 7 Tm + Tm 

wk x (var y) = var (tT x y) 

wk x (t - t') = wk x t - wk x t' 
wk x (A t) x (wk (suc x) t) 


This definition uses lifting on variables, recursively applies weakening for ap- 
plications and abstractions. There is a subtlety for the last case: since the 
abstraction binds the variable 0 in a term ¢, a variable x in \.t corresponds to 
a the variable x + 1 in t, which explains why we have to increase by one the 
weakened variable when going under abstractions. 


Substitution. We can then define substitution, as detailed in section 3.6.2: 


_C_/_] : Tm 7 Tm > N > Tm 


var y [u/x]withxZy 

(vary CLu/_j])]|yes_=u 

(var y [ECu/x ]) | no -p = var (4 x y 7p) 

(t- t') Du/xj =(tCLu/x])-: (t' -Tu/x ]) 
(x t) [Lu/x] =x (t [wk @u/t®x]) 


The two subtle corner cases when substituting a variable x by a term u in a 
term t are: 


— all the variables different from x have to be renumbered using + since 
substitution removes all occurrences of x, which is supposed not to be free 
in u, 


— when going under an abstraction, the term u has to be weakened using wk 
and the variable x has to be renamed using t in order to account for the 
fact that the variable 0 is bound by the abstraction. 


The $-reduction. Once substitution defined, we can define G-reduction by fol- 
lowing the usual definition, which is given in section 3.2.1: we implement it as 
an inductive predicate with one constructor for each inference rule defining the 
reduction. 
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data _»_ : Tm + Tm > Set where 

~B : {tu : Tm} 7 (At): urtCLCu/e] 
yal: {t t'u: Tm} rtrt' > t-urt'-u 
wr: {t uu' : Tm} *uvru' > t-urt -u' 
mA : {t t' : Tm} 7 tr t' > Atr*At' 


The iterated G-reduction relation ~* is the reflexive and transitive closure of the 
6-reduction relation ~. In order to define it, we can use the module 


Relation.Binary.Construct.Closure.ReflexiveTransitive 
of the standard library which defines the closure of any relation by 


data Star {A : Set} (R_: Rel A) : Rel A where 
€ : {x : A} 7 Star R x x 
—7_ : {x yz: A}7Rx y > Star R y z 7 Star R x z 


which is based on the characterization given in lmma A.1.2.1. We can therefore 
define the relation »* by 


_vk_ : Tm 7 Tm > Set 
wk = Star 4 


Church natural numbers. As explained in section 3.3.4, we can encode a natural 
number n as the term Af.cf"x. We can define a function nat' which, given 
a natural number n and two variables f and x, produces the term f"x by 
induction on n by 


nat’ : (n: N) (f x : N) 7 Tm 
nat' @ f x = var x 
nat' (suc n) f x = var f - nat' nf x 


and the Church encoding of natural numbers can then be defined as 


nat : N 7+ Tm 
nat n = % (XK (nat' n 1 @)) 


The term computing the successor of a natural number can then be defined as 


succ : Tm 
succ = AXAXK (var 1- (var 2: var 1°: var Q)) 


and the one computing the addition of two natural numbers as 


add : Tm 
add = AXXAXK (var 3: var 1°: (var 2° var 1°: var @)) 


Exercise 7.3.2.1. Show that those two last terms are correct, in the sense that 
they actually compute the successor and addition of natural numbers, i.e. we 
have 


succing : (n : N) 4 succ : nat n ~* nat (suc n) 
and 
adding : (mn: N) ~ add: nat m~- nat n »* nat (m + n) 


In order to do so, you should be prepared to prove quite a few lemmas about 
substitution and lifting (see below). 
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7.3.3 Keeping track of free variables. As a side note, let us present a re- 
finement of the above formalization. Since the implementation of A-calculus 
with de Bruijn indices is quite technical and error-prone, it is sometimes useful 
to have the most precise type possible, in order to detect errors early. One way 
to do so is to keep track of the free variables used in a term. Instead of defining 
the type Tm of all terms, we can define, for each natural number n, the type 
Tm n of terms whose free variables x are natural numbers such that 0 < x <n. 
This last constraint is conveniently described by requiring that x is an element 
of type Fin n, see section 6.4.8. This refinement of the formalization avoids in- 
advertently getting the wrong names for free variables and allows for reasoning 
by induction on the number of free variables in terms. We thus define terms as 


data Tm (n : N) : Set where 
var : Finn-7> Tmn 

: Tmn7>Tmn-7Tmn 

x : Tm (suc n) 7 Tm n 


In the last case, the term t should have at least one free variable, so that its 
type is of the form Fin (suc n), and will have one less free variable since one 
variable was bound, so that the return type is Fin n. 

Most previous functions can be adapted directly to this setting, so that we 
only give the refined types for those. The type now makes it clear that lifting 
inserts a fresh variable 


t : {n : N} 7 Fin (suc n) 7 Fin n > Fin (suc n) 

as well as does weakening 

wk : {n : N} + Fin (suc n) + Tm n ~ Tm (suc n) 

and that unlifting removes a variables 

+: {n : N} (x y : Fin (suc n)) 7 x #y 7 Finn 

as well as does substitution 

_[_/_] : {n : N} + Tm (suc n) 7 Tm n > Fin (suc n) > Tm n 

Finally, the type of reduction should indicate that it preserves free variables: 
> {n : N}o: Tm n 7 Tm n > Set 


The rest of the developments can be performed in this way. We do not present 
those here because they are more cumbersome to perform: in all the proofs, 
we have to show that the number of free variables is correctly handled. The 
formalization of section 7.5 is also quite close to this one: there, in addition to 
keeping track of the number of variables, we will also keep track of their type. 


7.3.4 Normalization by evaluation. As another side note, the reader having 
read section 3.5.2 might think that it could be a good idea to use normalization 
by evaluation in order to implement (-reduction instead of de Bruijn indices. 
This suggests defining the following notions of value and neutral term: 
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data Value : Set 
data Neutral : Set 
data Value where 
A_ : (Value + Value) > Value 
N : Neutral > Value 
data Neutral where 
var : Var 7+ Neutral 
_'_ : Neutral + Value 7 Neutral 


However, this definition is not accepted by Agda, which raises the following 
error: 


Value is not strictly positive, because it occurs to the left of 
an arrow in the type of the constructor ~_ in the definition of 
Value. 


This is explained in section 8.4.4, where we show that removing the associated 
restriction leads to Agda being inconsistent. We will see in section 7.5.3 that 
we can nevertheless implement normalization by evaluation for simply typed 
A-calculus. 


7.3.5 Confluence. Based on previous definitions, we now formalize one of the 
main results: the confluence of 6-reduction, following the proof given in sec- 
tion 3.4 (see [Hue94] for an admirable other way to prove this). 


The parallel G-reduction. We first define the parallel G-reduction by 
data _+_ : Tm + Tm + Set where 


Sv: (x : ND > var xX 3 var x 
3B: {t t' uu': Tm} rtst'’ ~7usu' -rxt-ust' Cu'/2] 
sa: {t t' uu’: Tm e7tst' 7usu'*= trust’: u' 
SA: {t t' : Tm} 7 tst' + AtSArt' 


which mirrors the definition of section 3.4.2. 


Local confluence of the parallel B-reduction. The local confluence of parallel 
G-reduction (also called diamond property) states that given terms t, u and 
v such that t parallel G-reduces to both u and v, there exists a term w such that 
both u and v parallel G-reduce to w. The proof can be formalized in Agda by 
case analysis on the reductions of ¢ to u and ¢ to v, closely following the proof 
presented in lemma 3.4.3.5: 


Sle : {tuv: Tm} r>tsurtave 
xT (Aw>usw * v 3 w) 


3-le (v x) (av .x) = var x , 3Vx, Sv x 

3-lc (3B ian r2) (38 S$] S2) with 3-lc ri $1 | 3-lc ‘2 S2 

3-lc (GB ri rz) (SB si S2) | wi, ri’ , Si’ | wo, ro’ , S2' = 
wi L w2 / @ ] , 3-sub @ r;' r2' , 3-sub @s;' s2' 

3-le (48 ri r2) (fa GA si) $2) with 3-le r; s; | 3-le re se 

S-lc (38 ri r2) (Sa (SA 81) S2) | wi, ri’, Si’ | We, r2' , S2' = 
wi L w2 / @ ] , 4-sub @ri' r2' , 3B si' S2' 

3-le (Sa GA ri) r2) (3B Si S2) with 3-le r; s; | 3-le r2 se 

lwi,i' , Si' | Wo, r2' , S2' = 

wi FC wo /@], 3B ri' r2' , S-sub @ s;' s2' 
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3-le (fa ri r2) (Sa si S2) with 3-le r; s; | 3-lc ro s2 
- | Go, ri’ , si") | Ge , r2' , S2') = 
Wi °° Wo, Sary' ro' , Sa s;' So' 
3-Ic (A r) (SA s) with 3-lc rs 
3-le GA r) (SA s) |w,r',s'=Aw, Gar'), Gs') 


Apart from recursive calls and the definition of parallel 6-reduction, this proof 
uses the lemma 3-sub which states that parallel G-reduction is compatible with 
substitution: if t reduces to t’ and u to u’ then t[u/a] reduces to t’[u’/a]. The 
proof follows the one of lemma 3.4.3.4: 


3-sub : {t t' uu' : Tm} (x: N7tst' e7usu' a 
tlLu/xjJ]at' Lu'/x ] 


3-sub x (Av y) ru with x 4 y 

3-sub x (Sv y) ru | yes p = ru 

3-sub x (Sv y) ru | no 7p = av (t+ x y 7p) 

3-sub x (Sa rt; rt2) ru = 3a (S3-sub x rt; ru) (3-sub x rt2 ru) 
4-sub x (3A rt) ru = 3A (S-sub (suc x) rt (S-wk @ ru)) 


s-sub x (36 {t} {t'} {u} {u'} rt, rt,) ru = 
subst. _3_ refl 
(sym (sub-sub t' u' _ @ x z&n)) 
(3B (4-sub (suc x) rt, (4-wk @ ru)) (4-sub x rt2 ru)) 


This function itself uses two auxiliary lemmas. The first one states that reduc- 
tion is compatible with weakening: 


s-wk : {t t' : Tm} (x : N) 7t3t' 7 wk x t 3 wk x t' 
and the second one is a form of commutation for double substitution: 


sub-sub: Vtuvxy7xsyor 
tDu/xjJELv/yj)]=tOCwk xv/sucy]CuLv/yjJ/xJ 


The latest requires considering a large number of cases depending on the relative 
values of x and y, and showing quite a few lemmas which were left behind the 
curtain in section 3.4.3: 


— commutation of liftings: when z < y, 
eae 
— commutation of unliftings: when x > y, 
ba ty 2 = 4y baat 2 


— commutation of liftings and unliftings: 


- i z= Ng Verge 2 when x 2 y, 
ee tyzite2 whens <y 


d 
— commutation of weakenings: when x < y, 


Te tyt = Ty41 tat 
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— commutation of weakening and substitution: 


u/xl) = (Tet ity u/a] when 
t, (t{u/a]) ea acs a 


and 
(te t)[u/a] =t 


Details are left to the reader (and beware, they are of a quite combinatorial 
nature). 


Confluence of the parallel G-reduction. In order to deduce that the parallel 
6-reduction is confluent, we first need to define the relation 3* as the reflex- 
ive and transitive closure of the parallel G-reduction relation 3: 


_3*_ : Tm > Tm > Set 
_3*_ = Star _3_ 


We can formally show lemma 3.4.3.6, stating that parallel 6-reduction satisfies 
a property between local confluence and confluence, by 


3-slconfl : {t uv: Tm} > 
taur7rts*e*v72r Tm (Awru 3k w x v Sw) 


s-slconfl {t} {u} {v} re =u,e,r 
3-slconfl r (s 4 ss) with 3-Ic rs 


| w' , s'., r' with 3-slconfl r' ss 


J} w,ss',r''=w,s'4ss' , or 


and deduce the confluence of the parallel G-reduction as in theorem 3.4.3.7 by 


s-confl : {t uv: Tm} + 
tse urt 3% v 920 Tm (A wou 3% w X Vv 3x Ww) 


s-confl {t} {u} {v} ess=v,ss,e 
3-confl {t} {u} {v} (r 4< rr) ss with 3-slconfl r ss 


| w' , ss' , r' with 3-confl rr ss' 


| w, ss'' , rr' =w, ss sor iret 


Confluence of the B-reduction. We can finally deduce the confluence of 6-re- 
duction, following the proof presented in sections 3.4 and 3.4.3. We first define 
the relation ~* as the reflexive and transitive closure of the 6-reduction relation 
o: 


_vk_ : Tm 4 Tm > Set 
_vk_ = Star _v_ 


We can show that if a term t G-reduces to a term u then t parallel 6-reduces 
to u (lemma, 3.4.3.1): 


wg: {tu: Tm} 7tr»urtau 
a3 9B = 36 3-refl 3-refl 
a3 (“1 r) = 3a (493 r) S-refl 
w4 (er r) = 3a s-refl (995 1) 
3 (WA r) = SA (49S 1) 
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where the reflexivity of parallel G-reduction (lemma 3.4.2.1) is shown with 


s-refl : {t : Tm} 7rtst 


s-refl {var x} = av x 
s-refl {t - t'} = 3a 3-refl 3-refl 
s-refl {X% t} = 3A s-refl 


From there, we can easily show that iterated G-reduction implies iterated parallel 
6-reduction: 


wkosk 2 {t ui: Tm} 7 t x* Ut Ske U 
AkKIBK ES =e 
wkIDK Cr 4 rr) = wor 4 vkoSK* rr 


Conversely, we can show that iterated parallel 6-reduction implies iterated 
G-reduction (see lemma 3.4.3.3, formal proof is left to the reader): 


Seoak > {t ui: Tm} >t Se ua t r%*% u 


We can finally use this to deduce the confluence 6-reduction (theorem 3.4.4.1) 
from the one of parallel 6-reduction shown above: 


~-confl : {t uv: Tm} + 
trex ut trv Tm (A wu 9% Ww X Vv 9X OW) 
~-confl rr ss with 3-confl (~*73* rr) (#*73* ss) 


J w, ss', rr' =w, 3%99* ss! , 3ko9% rr! 


7.4 Combinatory logic 


Combinatory logic, which was presented in section 3.6.3, can be implemented 
in a way similar to pure A-calculus. We begin by describing the type CL of 
combinatory logic terms: 


data CL : Set where 


var : N27 CL 
Mare : CL 27 CL 7 CL 
SKI: CL 


A term is thus either a variable, an application of a term to another, or one 
of the combinators S, K or |. Reduction of terms can then be formalized as a 
binary inductive predicate with constructors expressing the reduction rules for 
the combinators, as well as the fact that it is compatible with composition: 


data _>_ : CL 7 CL 7 Set where 


eS: {TUV:CL} 7 S-T*-U-Ve(T-V: WU: V) 
eK :{TU: CL} 3 Ke TST 

2-I : {T : CL} 4 I-TeT 
e-1:{TT'U: CL}4TeT 4T:-UeT + U 

e-r: {TUU': CL} 4U2U'4T: UST: U' 
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Abstraction. Following the definition given in section 3.6.3, we can define an 
analogue of abstraction for combinatory logic terms, which takes as argument 
a variable and a term in which the variable should be abstracted: 


abs : N2CL > CL 


abs x (var y) with x 2 y 

abs x (var _) | yes p=I1 

abs x (var y) | no =p =K - var (+ x y 7p) 
abs x (T - T') =S- abs x T- abs x T' 
abs x S =K:S§ 

abs x K =K-: K 

abs x I =K-:I1 


Our aim is now to show that this is a reasonable notion of abstraction. In order 
to do so, we first define the substitution in a term T of a variable x by a term U: 
_[_=:_] : CL ~7 CL 7 N-CL 

var y [U=: x ] with x 2 y 


(var y [U =: _ ]) | yes p =U 

(var y [ U =: x J) | no -p = var (1 x y 7p) 

(T- T') TU=: x J = (TC Us=: x J): C' —U=: x ]) 
S[~Us=: x J =S 

K LT U=: x J =K 

I~ Us=: x ] =I 


We also need to consider the reflexive and transitive closure >* of the reduction 
relation > by 


_=*k_ : CL + CL 7 Set 
_exk_ = Star _>_ 


Finally, we can formalize lemma 3.6.3.5, which states that abstraction behaves 
as expected: (Ax.T) U reduces to T[U/z]. 


cl-B : (x : N) (TU: CL) > (abs x T) - Ue* TDP UF: x ] 
cl-8 x (var y) U with x 2 y 

cl-B x (var _) U | yes p=e-Ise 

cl-B x (var y) U | no -p=e%-Kee 


cl-B x (T - U) V = =-S 4 o*-- (cl-B x T V) (cl-B x U V) 
cl-B x S U =eKiae 

cl-B x K U =eK ae 

cl-B x I U =eK de 

This proof uses the auxiliary lemma 

ex-- > {T T' UU' : CL} 9 Te T' 7 Ue U' AT Ue T' OU 


which states that reduction is compatible with concatenation and whose proof 
is left to the reader. 

Exercise 7.4.0.1. Formalize the translations between A-terms and combinatory 
logic terms of section 3.6.3, i.e. define functions 


icl : Tm + CL 

icl (var x) = var x 

icl (t- t') icl t - icl t' 
icl (A t) = abs zero (icl t) 


CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 340 


and 


ilam : CL 7 Tm 
ilam = var x 
ilam (T - T') = ilam T - ilam T' 


lan 
< 
fed) 
=) 
x 
VY 
I 


ilam S = XXX (var 2: var @- (var 1°: var @)) 
ilam K = % * var (suc Q@) 
ilam I = %® (var Q) 


and show various lemmas expressing preservation of reduction such as lemma 3.6.3.7: 


ilam-red : {T U: CL} 7 T® U7 ilam T ~»* ilam U 


7.5 Simply typed \-calculus 


7.5.1 Definition. We now present a formalization of simply typed A-calculus 
introduced in chapter 4. The reader is strongly advised to try this by himself 
before reading the section: what is easy to read is not necessarily easy to write! 
This is inspired by the excellent course [WK19]. 


Types. We suppose fixed an infinite countable set of type variables, say the 
natural numbers: 


TVar : Set 
TVar =N 


and the types are inductively to be defined as type variables of arrows between 
types: 


data Type : Set where 

X : TVar 7 Type 

_>_ : Type + Type 7+ Type 
Contexts. A context is simply a list of types. However, in order to adopt the 
usual notations, instead of defining the type Ctxt of context as List Type, we 
use the following definition: 


data Ctxt : Set where 
@ : Ctxt 
: Ctxt + Type 7 Ctxt 


a a 


a context is thus either the empty context @ or of the form TI , A for some 
context T and type A. 


Terms. The typel + Aof terms of type A in the context I is defined by induction 
by 
data _-_ : Ctxt + Type ~ Set where 
var : V {fT A} 3 TSBA7TEA 
iV {PAB} ATE (A2B)AITEASTEHB 
ee 2 a ee T,ARBITEA>B 
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where the constructors of the inductive type correspond to the typing rules of 
simply typed A-calculus given in section 4.1.4. In this formalization, we are 
right in the middle of the Curry-Howard correspondence: a proof that [+ A is 
derivable is precisely a A-term t of type A in the context [. In the constructor 
corresponding to variables, we use T 3 A, which is the type of proofs that a type 
A belongs to TI: such a proof essentially consists of a natural number n such that 
the n-th element of T is A. Formally, it can be defined as follows: 


data _3_ : Ctxt ~ Type ~ Set where 


zero : V {I A} 7 (T ,A)3A 
suc :V{PBA}7T3A7(C1,B)3A 


Note that this corresponds to identifying variables by their de Bruijn index in 


the context. 


Weakening. In order to define substitution and $-reduction, we have to make 
use of the weakening rule 


THt:A 


SONS Ar gc) 
T,cx:Btt:A 


and thus need to show that this rule is admissible. A naive approach would 
consist in trying to show the following corresponding lemma: 
wk: V{T AB} 7TFAVT,BEA 


However, we cannot manage to prove it because the induction hypothesis is not 
strong enough in the case of abstraction: we have to show 


T,x:B,y: Att: A’ 
Ta: Bb dy44:A> A’ a 


k) 


and we cannot use the induction hypothesis on the premise because the weak- 
ened variable x is not in the last position in the context. In order to overcome 
this problem, we could prove the following generalization of the weakening rule: 


T,Att:A 
ea eer ew 


k) 


It will turn out equally easy and more natural to prove the following even more 
general version: 
TFt:A 


AGysa 


whenever I is obtained from A by removing multiple typed variables, which we 
write [ C A (this corresponds to performing at once multiple weakening rules in 
the previous version). We thus define the “inclusion” relation between contexts 
as 


data _C_ : Ctxt + Ctxt ~ Set where 
G0 : OCG 
keep : V{T AA} FTFCAFT,ACA,A 
drop: V{T AAJ} 7TcA- TcA,A 
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and prove weakening as follows 


wk: V{T AA} AT CAZTREASAEFA 


wk i (var x) = var (wk-var i x) 
wk i (t - t') = wk it: wk i t' 
wk i (A t) = ®% wk (keep i) t 


where, in the case of variables, we use the following lemma showing that if a 
type belongs to a context, it still belongs to it if we add types to this context: 


wk-var : V{T AA} 7FTCA+T3RAAA3RA 


wk-var (keep i) zero = zero 
wk-var (keep i) (suc x) = suc (wk-var i x) 
wk-var (drop i) x = suc (wk-var i x) 


Finally, we can show that the first weakening rule considered above can be 
deduced as the particular cases where the inclusion is of the form [ CT, A: 


wk-last : V{T AB} 7TFAVT,BEA 
wk-last t = wk (drop C-refl) t 


where C-ref1l is a proof that inclusion is reflexive: 


c-refl : V{T} 7T cD 
c-refl {0} = 0G 
C-refl {fT , A} = keep C-refl 


Substitution. We can define substitution as follows. Given a term t in a con- 
text [, a variable x in the context [ and a term u of the right type, we want to 
construct a term t/u/z] obtained by replacing all occurrences of x by wu in t. It 
turns out to be simpler to define a generalization of this operation, and replace 
all the variables of T at once in t: given a function o (a substitution) which to 
a variable of I associates a term of appropriate type, we define the term ¢[o] 
obtained from t by replacing every free variable x by o() as follows. 


CJ]: V{{rAA}7TRFA+ (WW {B}7T3BA7AFB)AAFA 
var x [o]=ox 
(t- t') Lo] =(tLo])-: (t' Lo ]) 
(A t) [Lo ]=A4 (tL Q { zero > var zero ; 
(suc x) + wk-last (o x) }) J) 


In order to define $-reduction, we will only be interested in substituting the last 
variable of the context, which can of course be recovered as a particular case: 


—[_/@]:V{FAB}7T , BFAFTEFEBATEA 
t Lu /@] =t CL (A { zero > u ; (suc x) 7 var x }) J 


B-reduction. The $-reduction can then be defined as follows, similarly to the 
case of untyped A-calculus: 


data _»_ {f : Ctxt} : {A : Type} 7 T FA7I+- A ~ Set where 


~B : V {AB} (tt: PF ,AFB) (UU: THA) 
(At): uxt Lu /d] 
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wl: V {AB} {t t':TRA>B}9tr%t' 9 (U:FRA)>4 
t-urt' cu 

~r : V {AB} (t: TFA? B) 7 f{uu': PFA} AuUrYu' 4 
t-urt-:u' 

“A: V {AB} {tt':T,AFB}atrt' > 
Aty*AT' 


where we use substitution in the first case, as indicated above. 


7.5.2 Strong normalization. In order to show the effectiveness of the imple- 
mentation performed in the previous section, we shall prove a major theorem 
of A-calculus: the strong normalization theorem (theorem 4.2.2.5) which states 
that every typable term is strongly normalizing. We follow here the proof given 
in section 4.2.2 using reducibility candidates. Similar proofs in Coq can be found 
in [PdAC*10]. 


Strong normalizability. We first have to define what it means for the reduction 
relation ~ to be halting, or strongly normalizing. A term t is halting when there 
is no infinite reduction starting from it. It is however generally a bad idea to 
define concepts by negation, because we lose constructivity, and we will not 
directly adopt this definition. Instead, we will define by induction that a term t 
is halting whenever all the terms it can reduce to are themselves halting: 


data halts {f : Ctxt} {A : Type} : [T+ A > Set where 
sn: {t : TF A} > C{t' : TFA} >t t' > halts t') 7 halts t 


Note that we could have equivalently defined t to be halting when it is acces- 
sible (see section 6.8.6) with respect to the opposite of the reduction relation: 


halts : V {fr A} 7 T FA > Set 
halts t = Acc _*_t 


where the opposite of the reduction relation is 


* :V{FA}ATFAATDEA®& Set 


uet=tru 


Induction on the reduction. We can define the iterated reduction relation as 
usual by 


_ak_: V{PFA}ATREAFTEA @ Set 
_vk_ = Star _y_ 


Given a halting term t, the reduction relation ~ is terminating on terms u such 
that t »* u. We can thus reason by well-founded induction on it, i.e. we have 
the following induction principle: 


~-rec : V {fT A} {t : T + A} 7 halts t 7 (P: TFA @ Set) = 
(fu: TrRA}>t rw ur{v: TFA} AuUXVAP V) APU) > 
fu: PT FA}7t %¥* ur PU 


whose proof is left to the reader. 


CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 344 


Reducibility candidates. We formalize here the “typed” variant of reducibility 
candidates discussed in remark 4.2.2.6. We define sets Rr-4, indexed by con- 
texts [ and types A, consisting of terms of type A in the context I’, by induction 
on the type A by 


R: {fT : Ctxt} {A : Type} (t : [TF A) > Set 
R {TT} {X _} t=halts t 
R {Tf} {A > B} t = {u: TF A} 7>RuFR Ct - u) 


The core of the proof then consists in showing the three properties of proposi- 
tion 4.2.2.1 satisfied by reducibility candidates: 


CRI: V {F A} {t : TRA} 
Rt 7 halts t 
CR2: V{F A} {t t': TR A}2 
Rt7rtrt' 7Rt' 
CR3 : V {I A} {t : TRA} 
neutral t > ({t' : TFA} >7trt' FRt') -Rt 


where neutral terms are characterized by the following predicate: 


data neutral : V {fT A} 7 TF A > Set where 
nvar : V {I A} (x : T 3 A) 7 neutral (var x) 
napp : V {fT A B} (t : TF A> B) (Cu: TFA) > neutral (t - u) 


As in the proof of the above proposition, we show all three properties together 
and reason by induction on the type A. The formal proof is shown below: 


CR1 {f} {X _} rer 
CR1 {I} {A > B} {t} r = 
halts-vapp t x? (CR1 (r (CR3 (nvar x?) (A ())))) 
CR2 {[} {X _} (sn f) b=fb 
CR2 {IT} {A > B} r b {u} Ru = CR2 (r Ru) (~1 b u) 
CR3 {T} {X _} nf=snf 
CR3 {IT} {A > B} {t} n f {u} Ru = lemue 
where 
CR2x : {t t': FrFA}Aatr t' 7RtrRt' 
CR2* € Rt = Rt 
CR2x {t} {t'} (b « bb) Rt = CR2* bb (CR2 Rt b) 
lem: Vv7ur% v>R (t-: v) 
lem v ux*v = y-rec (CR1 Ru) (A v->R (t-: v)) 
(A {w} ur*w ind > 
CR3 (napp t w) 
A { (Al tet’ u) 7 f tet’ (CR2* urx*w Ru) ; 
(mr t wew') 7 ind wew' } 
) urxv 


We do not detail it, because it follows closely the proof of proposition 4.2.2.1, 
except for two points in the second case of CR1. 

We recall that this part of the proof consists in showing that for every 
term t € Rr-a+p, we have that t is halting and goes on as follows. Con- 
sider a variable x such that [ + a: A is derivable: this variable is neutral and 
thus in Ra by (CR3), therefore tx belongs to Rr-g is thus halting, from which 
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we deduce that t must also be halting. The last step is taken care of by the 
lemma halts-vapp, which states that if the term tx is terminating then t is also 
terminating: 


halts-vapp : V {f A B} (t : T FA >B) > (x: T 3A) 7 
halts (t - var x) 7 halts t 


The (easy) proof is left to the reader. We have to confess that we have been 
cheating in the above proof: there is no reason that we should have a variable x 
such that [+ a: A, unless A belongs to I, which nothing guarantees here (this 
was not a problem in the proof of proposition 4.2.2.1, because we were working 
in an untyped setting). In our Agda proof, we have simply been postulating the 
existence of such a variable: 


postulate x? : V{r A} 7T3A 


Of course, this is wrong, but it can be mitigated in two ways. First, if we had 
a more full-fledged programming language with data types (natural numbers, 
booleans, strings, etc.), we could prove that every program of a type which 
does not contain type variables is terminating in the same way, by using values 
instead of variables, and this would cover most cases of interest. For instance, 
supposing that we have a type N of natural numbers, we can show that t € Ryn 
because, by induction hypothesis, we have that ¢5 is terminating and reason as 
above. Another way to solve the problem is to change slightly the proof of 
the second case of CR1. Suppose that t € Rp-,4, by weakening we have that 
T,2: Att: A, and now we have the variable x such that [,2: AF a: A: by 
induction hypothesis we have that tx is halting, therefore the weakening of t is 
terminating, and therefore t is terminating. In practice, this makes the proof 
much more delicate, because we have to explicitly deal with matters related to 
weakening: in Agda, the weakening of t is not the same as t. Moreover, the 
definition of reducibility candidates has to be slightly generalized in order to 
take weakening in account and have the right induction hypothesis [Sak14]: 


R: {© : Ctxt} {A : Type} (t : [TF A) > Set 
R {TT} {X _} t=halts t 
R {[} {A > B} t = {f' : Ctxt} fu: T' + A} 


G:Tcr')+RutR (Wwkit-: uw) 


Strong normalization. Finally, we can deduce that simply typed A-terms are 
strongly normalizing by following section 4.2.2. We do not detail the proofs 
here. Lemma 4.2.2.2 can be formalized as 


R-abs : V {fT A B} (t : T ,AtB) 7 
(Cu: TFA) AR Ct Cu /0])) 7 R CA t) 


lemma 4.2.2.3 as 


R-sub : V {T A} (t : TF A) 
(o : V {B} 7 T3B7PT+B) > 
(v {B} > (x : IT 3B) 7R (o x)) 7 R (Ct Lo 1) 


the adequacy proposition 4.2.2.4 as 
R-all : V{T A} (t: TFA FRE 
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and finally the strong normalization theorem 4.2.2.5 as 


SN: V {Tf A} (t : TF A) > halts t 
SN t = CRI (R-all t) 


Exercise 7.5.2.1. Extend the language of section 7.1 adding abstractions and 
show preservation of types, progress and strong normalization. 


Weak normalization. In a typical study of a programming language, one is not 
usually interested in showing that every possible way of reducing programs 
terminates, but only that this is the case with the particular reduction strategy 
used by the language. In this case, the above proof can be somewhat simplified, 
as explained in section 4.2.5, which we now illustrate by showing that the call- 
by-value reduction strategy terminates for the simply typed A-calculus. 
Following section 3.5.1, we can characterize values and neutral terms by 


data value : V {T A} (C(t: PFA) > Set 
data neutral : V {Tf A} (t : TF A) > Set 


data value where 
vabs : V {[} {A B} (t : T , AFB) > value (X% t) 
vneu : V {Ff A} {t : [TF A} 7 neutral t 7 value t 
data neutral where 
nvar : V {I A} (x : T 3 A) 7 neutral (var x) 
napp : V {f A B} {t : TF A> B} fu: TFA} O 
neutral t + value u + neutral (t - u) 


the call-by-value reduction strategy is 


data _~  :V {lA} 7T FAT FEA @& Set where 

~B: V {TAB} (t:9T,AbFB) 7 {u: THA} 
value u + (A t) - u (t Cu /0]) 

al: V {lA B}{t t': PRA>B}> 
tryt' 2c :TrFAFtrurt'-u 

~r : V {0 AB} {t : PRA B}{uu': THA} 
value t7uvru' 7t-urt-:u' 


and the iterated reduction ~* is defined as usual. It is not hard to show that 
values and neutral terms are normal forms 


t 


value-nf : V {TF A} {t t' : TF A} > value tr-(trt 
neutral-nf : V {fT A} {t t' : [TF A} 4 neutral t >> (t+ t' 


value-nf  (vneu (napp n v)) (~l ru) = neutral-nf nr 
value-nf  (vneu (napp n v)) (wr t r) = value-nf vr 
neutral-nf (napp n v) (~1 ru) = neutral-nf nr 
neutral-nf (napp n v) (wr t r) = value-nf vr 


and that the reduction is deterministic 


det : V {T A} {t t, to: Te Ay >a7trt, 7t» t2 7 t; = to 
det (#8 t ui) (WB .t U2) = refl 

det (*B tu) (rr t' r) = t-elim (value-nf u r) 

det (ml r; u) (wl ro .u) = cong, _:_ (det r; r2) refl 
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det (~l r,; u) (wr t ro) = L-elim (value-nf t r,) 
det (mr tri) (*B ti u) = L-elim (value-nf u r;) 
det (wr tri) (Al ro u) = L-elim (value-nf t r2) 
det (mr t r1) (or xX r2) = cong, _-_ refl (det ri r2) 


The definition of the predicate halts is the same as above and one can easily 
show that it is preserved under reduction: 


~-halts : V {fT A} {t t' : TF A} 47 t-~t' 7 halts t > halts t' 
~-halts r (sn h) =hr 


In fact this could also be shown with the general 6-reduction. The novelty 
brought by using call-by-value reduction is that it is deterministic, and we thus 
now also have the “converse” property: 


+-halts : V {F A} {t t' : Tr A} 47t»t' + halts t' > halts t 
+-halts rh = sn (A r' 7 subst halts (det r r') h) 


Finally, the induction principle ~-rec presented for $-reduction of course still 
holds with call-by-value reduction. 

We take the following variant of the definition of reducibility candidates 
(note that we suppose that t is halting in both cases): 


R: {fF : Ctxt} {A : Type} (t : TF A) > Set 
R {TT} {X _} t=haltst 
R {[} {A > B} t = halts t x ({u: TFA} 7RuUAR Ct: u)) 


The three properties of candidates can now be proved independently and in a 
much simpler way (in particular, we do not need the x? hack that we used 
above): 


CR1: V {TF A} {t : TFA} 47Rt @ halts t 
CR1 {T} {X x} rear 
CR1 {[} {A > B} {t} r= fstr 


CR2 : V{l A} {tt': FRFA}FRtrtrt' +Rt' 
CR2 {Tf} {X x} r b = x-halts br 
CR2 {T} {A > B} {t} rb = 
~-halts b (fst r) , (A {u} Ru + CR2 (snd r Ru) (1 b u)) 


CR3: V{P A} {tt': TRAP Ftrxt' FRt' FRt 
CR3 {If} {X x} b r = *-halts br 
CR3 {fT} {A> A} br = 
(+-halts b (fst r)) , (A {u} Ru + CR3 (~1 b u) (snd r Ru)) 


7.5.3 Normalization by evaluation. We finally present a way to compute 
the normal form of terms in simply typed A-calculus using normalization by 
evaluation, see section 3.5.2. The idea is that we are going to interpret A-terms 
as Agda functions on normal forms, so that we can use Agda’s built-in reduction 
mechanism, and then translate the result back to A-calculus. More precisely, 
given a type A, we write [A] for the set defined inductively by 


— [X] = NFy: the set associated to a type variable is the set of all terms of 
type A in normal form, 
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— [A- B] = [4] - [4]: the set associated to the arrow type A > B is 
the set of all functions from [A] to [BJ]. 


Then, we are going to interpret a term t of type A as an element [t], of [A]. 
Since we need to properly take care of the free variables of t, our interpretation 
also depends on an environment p, which is is a function which to every free 
variable of t associates a term in normal form. The interpretation is defined by 
induction on t by 


[2] = el) [Art] =u [eesu) [tu] = [4o(le]) 


(above, (p,x +> u) is the environment which behaves as p, except that it asso- 
ciates u to x). In other words, we evaluate the term ¢t in the environment. p. 
Finally, we are going to define, for every type A, a reification function |.4, which 
translates every element of the set [A] to a \-term in normal form. The defini- 
tion will of course be performed by induction on the type A. In order to handle 
the case where A is an arrow type, it turns out that we also need a reflection 
function t4 which allows us to see a variable of type A as an element of [A]. Ac- 
tually, in order to be able to perform the definition of +4 by induction, we need 
to define it on all neutral terms and not only variables, and define it together 
with reification. We thus define two functions 


ta: [A] — NFa ta: NE, > [A] 


where NFy and NEy are respectively the normal forms and neutral terms of 
type A, by induction on A by: 


txt=t tx t=t 
Jase f =Az. le f(ta 2) tase t=urte (t(lau)) 


where x is supposed to be “fresh” in the lower left case. Finally, we can compute 
the normal form ¢ of a A-term t by 


t =A [¢] Po 


where po is the “trivial environment” which to a variable x associates the vari- 
able x. 


Terms. Our actual formalization is inspired by [Arn17]. We use the same defi- 
nitions as above for types, contexts, and A-terms. Inspired by the notation for 
bidirectional typechecking (section 4.4.5), we write T # A (resp. T » A) for the 
type of normal forms (resp. neutral terms) of type A in the context T, defined 
as the following inductive types: 


data _+_ : Ctxt + Type > Set 


data _»_ : Ctxt 7+ Type > Set 


data _4_ where 


abs : V{T AB} 727T,A4B2TeA2B 


neu: V {Tl A} 3 TeAw+Tea 
data _»_ where 
var : V {T A} > T3A7THA 


app: V{T AB} A7THASBIT HAST HB 
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Note that those are not characterized here by a predicate on terms as before, 
but rather implemented as a new inductive type. For this reason, we need to 
implement again substitution on those types: 


-[_]:V{TAA}7TeArTCAFAGA 
[TJ]: vV{TAA}7FTHRAFTCAFAHA 
abs t o[ = abs (t 4[ keep o ]) 
neut 4[ = neu (t +L o J) 

var x ¥#L = var (x vl o J) 

app t u 4L = app (t +L o J) (uel o J) 
where the case of variables is handled by 
vVEJ]:V{FAA}7T3A2TCAFASBA 
zero vL keep o ] = zero 


suc x vL keep o J suc (x vl o J) 
x vE~ drop o ] = suc (x vL o J) 


Interpreting types. The interpretation of types as sets of terms is performed 
following the above definition. We actually need this definition to also depend 
on a context and write [ [+ A ] for the interpretation of the type A in the 
context [, which is defined by 


[_+_] : Ctxt + Type > Set 
[TrFxXxi] =Texi 
[TFAF=B]=V {A} 7FTCAVFTAFAITAL. AFB] 


In the second case, we need to incorporate the weakening of context in the def- 
inition in order to be able to produce “fresh variables” when reifying functions. 


Reflection and reification. The reflection and reification functions are defined 
by mutual induction by following their definition given above. We also need to 
define a function Var which is the variable corresponding to the last element of 
the context in the set [IF A]: 


Var: V{TA} FLT ,AFA I 
tT: V{TA}F THA FATTFA] 
+: V{TAJFTTFAI]? Tea 


Var {I} {X i} = var zero 
Var {T} {A > B} o t = Tt ((var zero) »[ o J) C-refl t 


+ 40} £X-ijet =t 
t {T} {A > B} tou=T (app (tel o 1) (1 u)) 


+ {T} {X i} t = neu t 
+ {[} {A > B} f = abs (4 (f (drop C-refl) Var)) 


Interpreting terms. We finally need to define the interpretation [t], of terms t, 
for which we first need to introduce the notion of environment, which will be 
done in a typeful fashion here. Given contexts T and A, we write [ A +* T ] for 
type of environment which to every variable of type A in T associates a normal 
form of type A in A: 
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[_F*_] : Ctxt > Ctxt > Set 
[TAt* T]=V {A} (x: TSA) 7TAFA I] 


These are the environments adapted to terms whose free variables are in T. We 
can define the interpretation of terms following the above definition by 


[J:sV{TAA}7TFAFLAFKITI]A7LTAEFA ] 

[ var x ]p =p x 

[t-ul]p= (1 t ]p) Crefl (Lu ] p) 

TAt] p=AourLt ] CA x > wk* o (p x)) ,* u) 


In this definition, we have used the following auxiliary function, which extends 
an environment with a new value 


~*~: V{PAA}A7[TArRKT IAFL AFA] A271. AHCd,A) ] 


(ep ,x t) zero =t 
(p ,* t) (suc x) =p x 


as well as the following weakening principle for sets of normal forms: 


wkk : V {PT AA} OT 
wkx {[} {A} {X i} 
wkx {[} {A} {A > B} 


A+[TTFA]A7ELAFA] 
t=tere[o ] 
f=A tt 7 f (C-trans o t) t 


a ain 


Computing normal forms. We can finally define the normalization of A-terms 
by 


normalize : V{T A} 7TRFAFTHA 
normalize t = + ([ t ] id*) 


where the trivial environment is 

idx :V{T}7[ET rr. J 

idx x = tT (var x) 

An example. For instance, we can define the term t = (Az.Ay.a) x whose type 
isa: XgFt:X, => Xo by 


K:@,X@FX0O>X1>X@ 
K = * (A var (suc zero)) 


V:@,XOFX2@ 
Vv var zero 


t:0,X@FX12X0 
t=K-V 


If we ask Agda to compute (i.e. normalize) the normalized term normalize t, 
we obtain 


abs (neu (var (suc zero))) 


which is Agda’s way of saying Ay.2, as expected. 


CHAPTER 7. FORMALIZATION OF IMPORTANT RESULTS 351 


Handling n-conversion. The above algorithm can be used in order to test whether 
two A-terms are $-convertible: in order to know whether t and wu are convert- 
ible, we simply need to look whether their respective normal forms ¢ and @ are 
equal. However this does not work if we want to test for @7-convertibility. For 
instance, the terms ¢t = Axv.Ay.xy and u = Aw.a, of type (X > Y) 9 X OY 
are y-convertible and in normal form: we have f = t 4 u = &. In order to 
overcome this problem, a nice solution consists in slightly tightening the notion 
of normal form we consider, and require that terms with an arrow type should 
be abstractions: normal forms satisfying this are called n-long normal forms. 
In the definition of normal forms, this amounts to allowing considering neutral 
terms as normal ones, only when they have base types (and not arrow types), 
i.e. the definition of normal forms becomes 


data _#_ where 
abs : V{T AB} 7T ,Ae#BrTe#A>B 
neu: V{l i} 7 TreXirreXi 


Exercise 7.5.3.1. Modify the above normalization by evaluation algorithm in 
order to compute 7-long normal forms. 


CHAPTER 8 


Dependent type theory 


We now introduce the logic we have seen at work in Agda. The type theory that 
we are presenting here was originally introduced by Martin-L6f in 1972 [ML75, 
ML82, ML98], most of Martin-Léf’s work being freely accessible at [ML]. Its 
types are said to be dependent because they can depend on values. For instance, 
we can define a type Vec n of lists of length n, which depends on the natural 
number n. Another major feature of this type theory is that we can manipulate 
types as any other data: for instance, we can define functions which create types 
from other types, etc. In order to make this possible, the distinction between 
types and terms is dropped: types are simply the terms which admit a particular 
type, called “Type”. Making all this work together nicely requires quite some 
care. 

The core of the type theory is presented in section 8.1, universes being added 
in section 8.2, other usual type constructors in section 8.3 and inductive types 
in section 8.4. The ways a dependent proof assistant can be implemented is 
discussed in section 8.5 


8.1 Core dependent type theory 


In this section, we begin with the “minimal” version of dependent type theory, 
i.e. with (dependently typed) functions only. This is extended with more type 
constructors in section 8.3. 


8.1.1 Expressions. As indicated above, there is no distinction between terms 
and types and we call them both “expressions” in order to make this clear. As 
usual, we suppose fixed an infinite countable supply of variables. An expression e 
is a term of the form 


e,e’ := ax | ee’ | Ax®.e’ | I(x: e).e’ | Type 


In the following, we keep the old habit of writing t and A for expressions thought 
of as terms and as types, even though we cannot syntactically distinguish be- 
tween both. The expressions can be read as follows: 


— x: a term or a type variable, 


tu: application of a term to a term (or a type), 


— \a“.t: the function (the \-term) which to an element x of type A asso- 
ciates f, 


— II(a: A).B: the type of (dependent) functions from A to B, 
— Type: the type of all types. 


In Agda notation, II(# : A).B is written (x : A) + Band Type is written Set. 
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8.1.2 Free variables and substitution. In an expression of the form \x4.t 
(resp. (a : A).B), the variable x is said to be bound in t (resp. in B), and 
expressions are considered modulo renaming of bound variables, which is called 
a-equivalence. A variable which is not bound is free and we write FV(e) for the 
set of free variables of an expression e, which is defined by 


FV (a) = {x} 
FV(tu) = FV(t) UFV(u) 
FV(Ac4.t) = FV(A) U (FV(E) \ {x}) 
FV(II(« : A).B) = FV(A) U (FV(B) \ {2}) 
2) 


We say that a variable x occurs in an expression A when x € FV(A). 
Given expressions e and u and a variable x, we define the substitution e[u/z] 
of x by u in e by induction on e: 


z|u/x] =u 
ylu/z] = y ife fy 
(tt’)[u/2] = (t[u/2]) (t'[u/2]) 
(Ay4.t)[u/a] = Ay4l/*) ¢[u/a] with y ¢g FV(u) U {x} 
(I(y : A).B)[u/a] = W(y: A[u/az]).Blu/a] with y ¢ FV(u) U {a} 
Type [u/z] = Type 


8.1.3 Contexts. A context T is a list 
T= 272 Ajj. ss Sp An 


where the x; are variables and the A; are expressions. We sometimes write @ 
for the empty context, although we usually omit writing it. The set of free 
variables of a context is defined by 


and we extend the operation of substitution to contexts by setting 
T[t/a] = x1: Ay[t/a],...,¢, : Ap[t/z] 


whenever no variable x; occurs in t. 

Unlike the case of simply-typed A-calculus, it might happen that a context 
is not well-formed, in the sense that we do not expect it to make sense. For 
instance, if Vecn is the type of vectors of length n, the context 


n: Nat,l: Veen 


which declares that n is a natural number and / is a vector of length n, is 
well-formed. However, the context 


1: Vecn,n: Nat 
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is not well-formed: we begin by declaring that / is a vector of length n, without 
having declared what n should be before: the order in which variables are 
declared now really matters. Similarly, the context 


n: Bool,l: Veen 


is not well-formed: the type Vec n only makes sense when n is a natural number, 
and not a boolean. 


8.1.4 Definitional equality. In order to have a manageable type theory, we 
should identify some terms. In particular, we want to identify terms which are 
6-equivalent. For instance, suppose that we have a function f whose type is 
Vec(2 + 2) > A, taking a vector of length 2 + 2 as argument and returning a 
term of type A, and a term t of type Vec(3+1). We expect to be able to apply f 
to t even though the types do not match precisely: the term t can be thought 
of as having the type Vec(2 + 2), because we all know that 2+ 2=3+1. This 
means that in types, we consider terms up to some equivalence relation, called 
convertibility or definitional equality, which usually only consists in reduction. 

Although we will formalize this definitional equality as an equivalence rela- 
tion, we need some more properties on it: we need to be able to decide whether 
two terms are equivalent or not. In practice, this is performed by generalizing 
the method described in section 4.2.4 to test the G-convertibility of \-terms: we 
orient the equivalence in a way giving rise to a convergent (i.e. terminating and 
confluent) relation, so that two terms are equivalent if and only if they have the 
same normal form. For instance, if our addition satisfies (n+1)+m = n+(m+1) 
and 0+ m =m, we orient those relations as (n+ 1) +m ~ n+ (m+ 1) and 
0+m ~» m. In order to know whether two expressions involving sums of natural 
numbers are equal, we can then apply those relations as much as possible, and 
compare the resulting expressions for equality. For instance, 


24+2~14+36044~%4 and 14+3~"0+4~%4 


and therefore the two terms are equivalent because they have the same normal 
form 4. 


8.1.5 Sequents. In order to take all of this in account, we need to have three 
different forms of judgments in the sequent calculus: 


— [+ means that IT is a well-formed context, 


—~ [Tt t: A means that t has type A in the context I, 


~T + t=u: A means that ¢ and u are equal (i.e. convertible) terms of 
type A in the context I. 


As usual, we will give rules which allow the derivation of those judgments 
through derivation trees. The derivation rules for all these three kinds of judg- 
ments mutually depend on each other, so that they all have to be defined at 
once. 

As indicated above, there is no syntactic distinction between terms and 
types: both are expressions. The logic will however allow us to distinguish 
between the two. An expression A for which [+ A: Type is derivable for some 
context Tis called a type. An expression t for which TF t : A is derivable for 
some context [ and type A is called a term. 
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8.1.6 Rules for contexts. There are two rules for contexts: 


[Tt A: Type 
Ber T,x: Ab 
The first one states that the empty context @ is always well-formed. The second 
one states that if A is a well-formed type in a context T, then T', x: A is a well- 


formed context. In the second rule, one would expect that we require that T is 
a well-formed context as a premise, as in 


Tr [Tt A: Type 
T,x: Ab 


but we will see in section 8.1.11 that from the premise [ + A : Type, we 
will actually be able to deduce that T is a well-formed context (and similar 
observations could be made on subsequent rules). As indicated above, the reason 
why we need to ensure that A is a well-formed type in the context I is to avoid 
considering a context such as 


n: Bool,l: Veen 
as a well-formed context. Namely, the rules will not allow to derive 
n: Boolt Vecn : Type 


i.e. that Vec n is a well-formed type in a context were n is a boolean. 


8.1.7 Rules for equality. We now give the rules for definitional equality. 
First, we have three rules ensuring that equality is an equivalence relation, 
by respectively imposing reflexivity, symmetry and transitivity: 
THt:A TRFt=u:A TRFt=u:A TrFu=v:A 
TRrt=t:A TrFu=t:A TRFt=v:A 


We will need that the definitional equality is not only an equivalence relation, 
but a congruence: rules expressing compatibility with type constructors will be 
added later on for each type constructor. 

Finally, we add rules expressing the fact that a type can be substituted by 
an equal one in a typing derivation: 


THt:A TFA=B: Type TRrt=u:A THA=B: Type 
TKt:B TrFt=u:B 


Example 8.1.7.1. The example of section 8.1.4, where a function f expecting an 
argument of type Vec(2 +2) is applied to an argument | of type Vec(1 +3), can 
be typed using these conversion rules as follows: 


: oF 1435242: Nat 
a PER 
... EL: Vec(1 + 3) ... F Vec(1 + 3) = Vec(2 + 2) 
ey Nea _.. FL: Vec(2 + 2) 
f : Vec(2 + 2) + A,l: Vec(1+3)F fl: A 
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8.1.8 Axiom rule. We now turn to rules allowing the typing of a term. The 
axiom rule is 


T,v@:A,I’+ 
T,e: A, ba:A 


(ax) 


with the following side conditions: 
— « ¢dom(I”), and 
— FV(A)Ndom(I”) = 0. 


We follow the convention that a variable always refers to the rightmost occur- 
rence of the variable in a context. With this in mind, the side conditions avoid 
clearly wrong derivations such as 


(ax) (ax) 
n: Nat,l: Veen,n: Bool I: Vecn 


z:A,x:Brea:A 


Alternatively, we could use the convention that the variables declared in a con- 
text are always distinct, which we can always do because we consider terms up 
to a-conversion, although this is a bad habit because we do not want to spend 
our time performing a-conversions when implementing a proof assistant. 


8.1.9 Terms and rules for type constructors. We now give the rules for 
II-types, which are generalized function types. As for any type constructor in 
this type theory, we will need to have three constructions for expressions: 


— a constructor for the type, 
— a constructor for the terms of this type, 
— an eliminator for the terms of this type, 
together with six rules with the following purpose 
— formation: construct a type with the type constructor, 
— introduction: construct a term of the type, 
— elimination: use a term of the type, 
— computation: (6-)reduce a term of the type, 


— uniqueness: express a uniqueness property of the constructed terms, which 
corresponds to an 7-equivalence rule, 


— congruence: express that definitional equality is compatible with the term 
constructors. 


We insist here on this structure because it will be the same for all the subsequent 
type constructors that we are going to see in section 8.3. Let’s see that in action 
for I-types. 
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8.1.10 Rules for I-types. The II-types are dependent function types: they 
are like the plain old function types, except that the type of the result might 
depend on the argument. Such a type is written 


II(a: A).B 
which corresponds to the Agda notation 
(x : A) 7B 


and should be read as the type of functions taking an argument x of type A and 
returning a value of type B. Here, the variable x might occur in the type B, 
ie. the type B can depend on x. For instance, a function taking a natural 
number n as argument and returning a vector of length n will have the type 


II(n : Nat). Veen 


see section 6.4.7 for actual uses of such functions. In a II]-type as above, the 
variable x is bound in the type B, and we can rename bound variables. For 
instance, the previous type is a-equivalent to II(m : Nat). Vecm. From a logical 
point of view, a type II(a: A).B, can be read as a universal quantification 


Vx € A.B 


If we follow the lists given in section 8.1.9, the corresponding constructors 
for expressions are 


— the constructor for types: I, 
— the constructor for terms: the A-abstraction, 
— the eliminator for terms: the application. 


Finally, we can give the six required rules for I-types. 


Formation. The type formation rule is 


[Tt A: Type T,2:AtB: Type 
TF Il(a: A).B: Type 


(Ip) 


and allows constructing a type II(a# : A).B whenever A and B are well-formed 
types. 
Introduction. The introduction rule is 


T,c4:AFt:B 
Tb \e4.¢: Ie: A).B 


(Ir) 


and states that a A-abstraction \r4.t is a function taking an argument 2 of 
type A and returning a term t of some type B: it should thus have the type 
II(a : A).B. 
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Elimination. The elimination rule is 


Th¢: U(x: A).B Thku:A 
THE tu: Blu/a] 


(Iz) 


and states that if t is a function of type I(x : A).B and wu is an argument of 
type A then we can apply ¢ to u. Again, note that the type B can depend on z, 
so that the type of the result tu should be B where x has been replaced by the 
actual value u of the argument. 


Computation. The computation rule is 


T,2:AFt:B Tru:A 


II 
Dk (Ac4.t) u = t{u/a]: B Uc) 
this is precisely the 6-reduction rule. 
Uniqueness. The uniqueness rule is 
Trt: (a: A).B 
(IIu) 


Tht=Aa4.t2: I(x: A).B 
this is precisely the 7-expansion rule. 
Congruence. The three congruence rules are 


[Tt A=A’': Type T,z:A+B=B':Type 
[TEU (#: A).B =II(x: A’).B’ : Type 


Tt A=A’': Type T,2: At B: Type T,a:Artt=t':B 
Th Att = Art’ t! : M(x: A).B 


and 
Pega5 hrs Aye Pky Siy2 A 


PEt ba 2B fe] 


They express the expected compatibility of equality with all the constructors 
for expressions: I]-types, A-abstractions, and applications. We will generally 
omit the congruence rules in the following, but they should be formulated in a 
similar way for every constructor. 


Example 8.1.10.1. The polymorphic identity function, which takes a type A and 
returns the identity function from A to A can be typed as follows: 


A: Type,x: Ak 
(ax) 
A:Type,x:AbFa:A 
A: Type + A24.2 : I(x: A).A 
+ NATYPS \oA ao : TI(A : Type).II(x : A).A 


(Tr) 


(Hr) 
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Arrow types. The traditional arrow type A > B can be recovered as the partic- 
ular case of a II-type II(a : A).B which is not dependent, meaning that x does 
not occur as a free variable in B. We thus write 


A->B = II(_: A).B 

where “__” is a variable name which is supposed to never occur in any type; in 
particular, we always have B[t/_] = B. It can be checked that all the rules give 
back the usual ones, up to notation. For instance, (IIg) allows us to recover the 
elimination rule: 


Trt:A-B Thku:A 
TrFtu:B 


(+k) 


8.1.11 Admissible rules. Many basic properties of the logical system can be 
expressed as the admissibility of some rules, some of which we now present. We 
concentrate on typing rules, i.e. judgments of the form [ F t: A, but similar 
admissible rules can usually be formulated for the two other kinds of judgments: 
well-formation of contexts ([' ) and convertibility ([ F t = u: A), details being 
left to the reader. The proofs are, as usual, performed by induction on the 
derivation of the judgment in the premise. 

Before stating those, we first make the following simple, but useful, obser- 
vation: 


Lemma 8.1.11.1. For every derivable sequent [ - t: A, we have the inclusions 
FV(t) C dom(T) and FV(A) € dom(T). 


Proof. By induction on the derivation of the sequent. 


Basic checks. The rules ensure that only well-formed types and contexts can be 
manipulated at any point in a proof. This can be formulated as the admissibility 
of the following rules: 


TFt:A TFt:A 
Tr [TE A: Type 


Trt=u:A TrFt=u:A TrFt=u:A TrFt=u:A 
[Tr TrKt:A Tku:A [Tt A: Type 


To be honest the admissibility of those rules is “almost” true: this will be 
discussed in section 8.2. 


Weakening rule. The following weakening rule is admissible, accounting for the 
fact that if some typing judgment holds in some context, it also holds with more 
hypothesis in the context. 


[TE A: Type rj’tt:B 
T,vz:A,’Ft:B 


(wk) 


with x ¢ FV(I’) UFV(t) UFV(B). 
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Exchange rule. The exchange rule states that we can swap two entries x : A and 
y: B in a context, provided that there is no dependency between them, i.e. B 
does not have x as free variable: 


[TE B: Type T,2:A,y:B,AFt:C 
T,y: B,x:A,AFt:C 


Here, the hypothesis I + B: Type ensures that B does not depend on x by 
lemma 8.1.11.1. 


Cut rule. The type theory has the cut elimination property, which corresponds 
to the admissibility of the following rule: 


TKt:A T,a:A,AFu:B 
T,Alt/z] + ult/2] : Bit/a] 


see sections 2.3.3 and 4.1.8. 


(cut) 


8.2 Universes 


8.2.1 The type of Type. There is one missing thing in the type theory we have 
given up to now. Everything should have a type in the sequent we manipulate, 
but the constant Type does not, because there is no rule allowing us to do so. 
For instance, in order to type the polymorphic identity in example 8.1.10.1, we 
have to show that the context 


A:Type,x:AFa:A 
is well-formed, which will at some point require showing 
+ Type : Type 


which we have no rule to derive for now. 
There is an obvious candidate for the rule we are lacking: we are tempted 
to add the rule 
Te 


[+ Type : Type 


which is sometimes called the type-in-type rule. This rule was in fact present 
in the original Martin-L6f type system, but Girard showed that the resulting 
system was inconsistent [Gir72]. A variant of this proof is presented below. 


8.2.2 Russell’s paradox in type theory. We show the inconsistency with 
the above rule for Type by encoding, in Agda, Russell’s paradox presented in 
section 5.3.1. 


Encoding finite sets in OCaml. As a starter let’s first see how to implement 
finite sets in OCaml. A finite set 


A= {a1,..., an} 


whose elements a; belong to some fixed type ’a, can be described by giving its 
elements: we can encode it as an array of elements. We thus define the type of 
sets of elements of ’a as 
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type 'a finset = Finset of ‘a array 


which is a type, with one constructor, which takes an array of ’a as argument. 
This is thus essentially an array of ’a, and the usefulness of the constructor 
shall be explained below. It should be noted that this representation is not 
faithful: two arrays differing only by the order of their elements or repetitions 
of elements represent the same set. For instance, the arrays 


[13;1;2]] and [12;1;3;1;2;2]] 


both encode the set {1, 2,3}. 
We can code the function which determines whether an element x belongs 
to a set a, by looking whether the element occurs in the array: 


let mem (x : 'a) (Finset a: 'a finset) = 
let ans = ref false in 
for i = @ to Array.length a - 1 do 
if a.(i) = x then ans := true 
done; 
lans 


or, more elegantly, using the standard library, as 


let mem (x : 'a) (Finset a: 'a finset) = 
Array.exists (fun y -> x =y) a 


Similarly, inclusion of sets can be coded by 


let included (Finset a: 'a finset) (b : 'a finset) = 
Array.for_all (fun x -> mem x b) a 


Finally, the equality of two sets can be tested with 


let eq (a : 'a finset) (b : ‘a finset) = 
included a b && included b a 


This is the right function to test equality of sets, which does not distinguish 
between two representations of the same set, and should always be used to 
compare sets, as opposed to the standard equality =. 

In order to get closer to set theory, we shall now implement finite sets whose 
elements are themselves finite sets, i.e. we now consider the type 


type finset = Finset of finset array 


The previous functions are now mutually recursive because membership should 
be tested with respect to the suitable notion of equality: 


let rec mem (x : 'a) (Finset a : finset) = 
Array.exists (fun y -> eq x y) a 


and included (Finset a : finset) (b : finset) = 
Array.for_all (fun x -> mem x b) a 


and eq (a: finset) (b : finset) = 
included a b && included b a 
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Encoding set theory in type theory. We can play the same game in type theory 
and define finite sets of elements in a type A in the same way. Instead of using 
arrays however, it is more natural to encode a finite set as a function of type 


Finn-7aA 
for some natural number n: such a function f encodes the set 


{f(0), F),---, f(r — D} 


(we recall that Fin n is the type whose elements are (isomorphic to) natural 
numbers from 0 to n — 1). We can thus define 


data finset (A : Set) : Set where 
Finset : {n : N} > (Finn - A) > finset A 


In order to define “sets” of elements of A, instead of finite ones, we can allow 
indexing by any type instead of Fin n. Finally, we can encode sets (in the sense 
of type theory) as sets of sets. This suggests the following encoding of sets 


data U : Set; where 
set : (I: Set) +7 (I 7 U) 7 U 


which is due to Aczel [Acz78, Wer97]: a set consists of a type I of indices and a 
function which assigns a set to every element of I. In order to avoid confusion 
with the notation Set of Agda, we write U for the type of our sets. 

With this encoding the usual constructions can be performed. For instance, 
we can define the empty set: 


@:U 
@ = set 1 (A ()) 
the pairing of two sets: 


) : (AB: U) 7 U 
{ A , B ) = set Bool (A {false + A ; true > B}) 


the product of two sets: 


prod : (AB: U) 7 U 
prod (set I f) (set J g) = 
set (Ix J) A{G,j) 7 (fF i,gj)?}) 


the equality of two sets (which implements the extensionality axiom): 


—==_ : (AB: U) > Set 
set I f == set J g = 
(i:ID7r7rrIAajrfi=a=g ja 
CCS 2 ED Oe Lot a Se) 


the membership relation: 


€_: (AB: U) 4 Set 


A€set If =XLI AiIiAA =f i) 


the union of sets (which implements the axiom of union): 
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U_: (A: U) 7U 
U set I f = 
set (ZI (A i-dom (f i))) ALG, 53) 7F (fF i) jh) 


where the domain is given by 


dom : U > Set 
dom (set I _) =I 


and the function is given by 


F: (A: U) 7 domA7U 
F (set _f) =f 


the von Neumann natural numbers (which implements the axiom of infinity): 


vonN : N7U 
vonN N.zero = @ 
vonN (N.suc n) = U ¢ vonNn , [ vonNn ] ) 


Nat : U 
Nat = set N vonN 


and so on. 
The Russell paradox. Now, suppose that we accept this type-in-type rule which 


tells us that Type has type Type. This behavior can be achieved in Agda by 
using the flag 


{-# OPTIONS --type-in-type #-} 
at the beginning of the file. As before, we define sets as 


data U : Set where 
set : (I : Set) +7 (I 7 U) 7 U 


(the careful reader will notice that the type of U is now Set instead of Set,), 
and we consider the following notion of membership 


A€setIf=Z51I(Ai+fi#=A) 


€_ : (AB: U) 4 Set 


This function is “wrong” because equality is tested here using propositional 
equality instead of the proper equality == between sets, but it will be enough 
for the purpose of implementing paradoxes and give rise to shorter code. We 
declare a set to be regular if it does not contain itself, which can be defined by 


regular : U > Set 
regular A = - (A € A) 


and consider Russell’s paradoxical set R of all sets which do not contain them- 
selves, see section 5.3.1: 

R: U 

R= set (ZL U (A A > regular A)) proj, 
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This set can be shown to be both regular 


R-nonreg : > (regular R) 
R-nonreg reg = reg ((R , reg) , refl) 


and non-regular 


R-reg : regular R 
R-reg ((A , reg) , p) = subst regular p reg ((A , reg) , p) 


from which we can deduce the inconsistency of the system: 


absurd : 1 
absurd = R-nonreg R-reg 


8.2.3 Girard’s paradox. It is always good to have a handful of paradoxes 
at hand in order to test a proof assistant: depending on the logic, one or the 
other might be easier to encode. For instance, the above formalization crucially 
depends on the fact that we have inductive types, which is not the case of 
all proof assistants. As another example, we shall present Girard’s original 
paradox [Gir72], which is based on the following set-theoretic paradox which 
shows that there is no ordinal of all ordinals. 


The Burali-Forti paradox. A well-ordered set is traditionally defined as a set A 
equipped with a total order < which is well-founded, i.e. there is no infinite 
strictly decreasing sequence of elements, see section 6.8.6 and appendix A.3. 
Alternatively — and this is better suited to formalization — a well-ordered set 
can be defined as a set A equipped with a relation < which is 


— transitive, 
— well-founded, and 
— extensional. 


By extensional, we mean here that, given y, z € A, if x < y is equivalent to 7 < z 
for every x € A, then y = z: 


(VaEeAa<ySr<z)S>y=2 


Two well-ordered sets are isomorphic when they are in bijection with order- 
preserving functions. An ordinal is the isomorphism class of a well-ordered set. 
An embedding of an ordinal A into an ordinal B is an increasing function f from 
A to B, i.e. such that x < y implies f(x) < f(y) for every x,y € A. Such an 
embedding is bounded when there exists an element 6 € B such that f(a) < 6 
for every x € A. We define a relation < on ordinals by setting A < B whenever 
there exists a bounded embedding of A into B. This relation can be shown to 
be transitive, well-founded and extensional. 

The Burali-Forti paradox [BF97] shows that the ordinal numbers do not 
form a set: they are too big to be so, in the same sense that the collection of 
all sets is too big to itself be a set. Namely, suppose that there is a set (2 of all 
ordinals. By the above, when equipped with the relation <, this would induce 
an ordinal that we still write 2. It can be shown that for every ordinal A, we 
have A <Q. In particular, we have Q < 2, which is in contradiction with the 
hypothesis that < should be well-founded. Details to follow. 
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Formalizing Girard’s paradox. The Girard paradox [Gir72] is an implementation 
of the above paradox in Martin-L6f type theory with the type-in-type rule. A 
nice account of this paradox can also be found in the introduction of [ML98}]. 
We present here a formalization of it. 

The notion of ordinal can be formalized in Agda by a record 


record Ord : Set where 


field 
car : Set 
rel : Rel car 
trans : Transitive rel 
wf : WellFounded rel 


It is a 4-uple consisting of a carrier type car and a relation rel on it (see 
section 6.5.9), together with a proof that this relation is transitive and well- 
founded. Here, the predicate of being transitive for a relation is defined by 


Transitive : {A : Set} + Rel A 7 Set 
Transitive {A} R={x yz: A}7RxyrARyYzZARxXZ 


and well-foundedness is detailed in section 6.8.6. The above definition actually 
formalizes a generalization of the notion of ordinal: in order to define traditional 
ones, we should also impose that they are extensional, i.e. a proof of Extensional 
rel, where the extensionality predicate is 


Extensional : {A : Set} > Rel A > Set 
Extensional {A} R = 

Cy aA (COCA Ar eR ey yey 
but this will play no role in our proof so that we omit it for simplicity (we refer 
the reader to [Unil3, Section 10.3] for a detailed formalization of ordinals). 
Given an ordinal A, we use the more readable notation || A || for its carrier: 


\|_|| : Ord > Set 

|| A || = car A 

Since ordinals are well-founded, we can use the following induction principle in 
order to reason about those: 


Ord-rec : (A: Ord) + (P: || A |] + Set) - 
CGe eS eA Ge SAID rel cA yk Pye Pa) 
(x: [LAID 7P x 


Ord-rec A = wfRec (wf A) 


An embedding of an ordinal to the other consists of a function between the 
underlying carriers together with a proof that it is increasing, and we write Emb 
A B for the type of embeddings of A into B: 


record Emb (A B : Ord) : Set where 
field 
fun : || A || > || B I 
inc : V {x y} 7 rel A x y 7 rel B (fun x) (fun y) 


Such an embedding is bounded by b in B when the image of every element of A is 
below b and we write BEmb A B b for the type of embeddings of A into B which 
are bounded by b: 
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record BEmb (AB: Ord) (b : || B ||) : Set where 
field 
emb : Emb A B 
bnd : (x: |] A |]) 7 rel B (fun emb x) b 


Based on this, we can define the relation < on ordinals, such that A < B whenever 
there is a bounded embedding of A into B. 


_<_ : Ord 7 Ord 7 Set 
A<B=Z |B || (A b > BEmb A B b) 


This relation is easily shown to be transitive 


<-trans : Transitive _<_ 


<-trans (y , f) (Zz , g) = (fun (emb g) y) , (comp f g) 
and well-founded with some more work: 


<-wf : WellFounded _<_ 
<-wf A = acc lem 
where 
lem : (B : Ord) + B<A- Acc _<_ B 
lem B (a , f) = Ord-rec A P' lem' a Bf 


where 

P' : || A || 4 Set 

P' a = (B: Ord) 7 BEmb B A a > Acc _<_ B 

lem' : (a: || A |) 7 CCa': I} A I) 7 rel A a' a-P' a') 4 P' a 


lem' a ind Bf = 
acc (A { C (b , g) > 
ind (fun (emb f) b) (bnd f b) C (comp g f) }) 


In words: showing that this relation is well-founded amounts to showing that ev- 
ery ordinal A is accessible, which by definition of accessibility amounts to show- 
ing that every ordinal B with B < A is accessible. By definition of the relation 
< on ordinals, this amounts to showing that for every embedding f : B > A 
bounded by a € A, we have that B is accessible. This last property is written 
P’(a) and shown by induction on a € A (with respect to the order <, on the 
elements of the ordinal A, which is well-founded). Supposing that the prop- 
erty P’(a’) hold for every a’ € A with a’ <4 a, we have to show P(a), that 
is, given an embedding f : B — A bounded by a, that B is accessible. By 
definition of accessibility, this amounts to showing that C' is accessible for every 
C' < B. Given such an ordinal C, the fact that C < B means that there exists 
an embedding g : C — B which is bounded by b € B. By composing f and g, 
we therefore have an embedding f og : C > A which is bounded by f(b) and 
we conclude that C is accessible by applying P’(f(b)). In the above proof, the 
composition of the embedding is handled by the function 


comp : {ABC : Ord} {b : || B |]} {c : ]] C ]]} 
(f : BEmb A Bb) (g : BEmb BC c) 4 
BEmb A C (fun (emb g) b) 


whose proof is left to the reader. If we suppose that we have the type-in-type 
rule with 
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{-# OPTIONS --type-in-type #-} 
we can then define the ordinal (2 of all ordinals: 


Q : Ord 
car Q = Ord 
rel Q = _<_ 


trans Q = <-trans 
wf Q = <-wf 


We can now show that 2 is the maximal element of ordinals: 


A<Q.: {A : Ord} 7A <Q 
A<Q {A} = A , record { 
emb = record { 
fun=Ax7ALX; 
inc = A {x} {y} x<y 7 L-inc A x<y } ; 
bnd = ~A x 4x , snd (1-< A x) } 


Above, given an ordinal A, we show that we have A < 2: we need to construct 
a bounded embedding f : A > Q. Here, we take the function which takes 
an element « € A to the ordinal A | x defined as the restriction of A to the 
elements smaller than z, i.e. 


_+_ : (A: Ord) > || A || 7 Ord 
car (Ata) =Z Al] (A x7 rel A x a) 
rel (A + a) x y =rel A (fst x) (fst y) 
trans (A + a) x<y y<z = trans A x<y y<z 


wf (A + a) = Iwf 

The proof of well-foundedness 4wf, deduced from the well-foundedness of A, is 
left to the reader. The proofs that the embedding is increasing 

4-inc : (A: Ord) fab: |] A |} rel AabrAta<Alb 

and bounded 

t-< : (A: Ord) (a: |] A Il) 7 A +a <A 

are also left to the reader. As a particular case of the above lemma, we have 
that Q <Q: 


22 2 2< 2 
2<Q = A<Q 


which contradicts the fact that the relation < is well-founded. Namely, any rela- 
tion R with x Rx will have an infinite decreasing sequence: namelyx Ra Rak... 
Constructively, this is of course shown by induction: 


wf-irrefl : {A : Set} (R : Rel A) ~ WellFounded R + 
(x : A) A7RXXAL 
wf-irrefl R wf x = 
wfRec wf (A x 7 R x x 4 1) (A y ind Ryy > ind y Ryy Ryy) x 


From which we see that accepting 2 as an ordinal leads to the system being 
inconsistent: 


absurd : L 
absurd = wf-irrefl _<_ <-wf 2 2<Q 
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Variants and other paradoxes. The Girard paradox is analyzed by Coquand 
in [Coq86] and simplified by Hurkens [Hur95]. Some other paradoxes have also 
been produced by Coquand [Coq92a, Coq95]. For instance, one is, as translated 
by Abel [Abe17]: 


{-# OPTIONS --type-in-type #-} 


data U : Set where 
c: ({A : Set} 4A 7A) 4 U 


empty : {A : Set} *>U7A 
empty (c f) = empty (f (c (A z > z))) 


absurd : {A : Set} 7A 
absurd = empty (c (A z > z)) 


8.2.4 The hierarchy of universes. How should we fix this? If we think of the 
situation we already faced when considering naive set theory, the explanation 
was that the collection of all sets was “too big” to be a set. Similarly, we think 
of Type as being “too big” to be a type. However, we still need to give it a 
type, and the natural next move is to introduce a new constructor, say TYPE, 
which is the type of “big types”, together with the rule 


[Tr 
TF Type : TYPE 


stating that Type is a big type. However, we now need to give a type to TYPE, 
which forces us to introduce a type of “very big types” and so on. 

In the end, we introduce a hierarchy of types Type; indexed by natural 
numbers 7 € N, together with the rule 


TH 
['F Type; : Type;44 


for every 1 € N. The type Type is simply a notation for Typeg, Type, is the 
type of “big types”, Type, is the type of “very big types”, Types is the type of 
“very very big types”, and so on: 


Type, : Type, : Type, : Types: ... 


The types Type, are called universes and i is called the level of the universe 
Type;. In order to make the theory more manageable, we also add a cumulativity 
rule 
[TE A: Type; 
Tt A: Type;41 


which states that a “small” type can always be seen as a “bigger” type. This 
allows us to see a type in a given universe as a type in a universe of higher 
level, so that all constructions can be cast in to higher levels if necessary and 
we do not have to precisely take care of the levels. Finally, we change all the 
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type formation rules by adding levels to occurrences of Type. For instance, the 
formation rule for II-types becomes 


[TE A: Type; T,2: At B: Type, 
TF II(a2: A).B: Type; 


(Ir) 


In the following, except in this section, we will not be precise on the universe 
levels and still allow us to use the type Type. However, it can be checked that 
all the subsequent constructions can be adapted as above in order to properly 
take levels in account. 


Universes in Agda. In Agda, Set is a notation for Typeg, Set; is a notation for 
Type, and so on. For instance, we can define the type of predicates on a type A 
as 


Predicate : (A : Set) + Set, 
Predicate A = A > Set 


However, if we try to define Predicate as being of type (A : Set) 7+ Set, Agda 
will complain that we are trying to fit a type in Type, into Type, by issuing 
the following error message: 


Set; != Set 
when checking that the expression A 7 Set has type Set 


Cumulative universes. Systems like Coq have the cumulativity rule built-in, but 
systems such as Agda chose not to, mostly for technical reasons. Since we don’t 
have it, the type formation rules now have to allow constructors to have different 
levels, and for instance the formation rule for I-types has to be changed to 


[TE A: Type, l,2z: ALB: Type; 
TF II(a: A).B: Type 


(Ir) 


max(i,j) 
We thus need three operations on levels 2: 
— we need to have a level 0, 


— for every level i we need to have a successor level i + 1 (in order to type 
Type;), and 


— we need to be able to compute the maximum of two levels. 


This why in Agda levels are defined in the module Level by 


postulate 
Level : Set 
lzero : Level 
lsuc : (i : Level) > Level 


_U_ : (i j : Level) ~ Level 
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Universe polymorphism. Up to now, we have been defining equality as 


data _=_ {A : Set} (x : A) : A 7 Set where 
refl : x =x 


This means that we can use it to compare the elements of a small type (for 
instance, we can use it to compare natural numbers), but we cannot use it to 
compare elements of a large type, typically types. For instance, suppose that 
we want to show that the type N is different from the empty type 1, i.e. we want 
to prove: 


NB: - (N 


1) 


If we try to prove this with the above definition for equality, Agda complains 
that 


Set; != Set when checking that the expression N has type Set 


This is because we are trying to compare the types N and L, which are of type 
Set, which is itself of type Set;, whereas equality is defined on elements of a 
type A whose type is Set. To overcome this, we could define another equality =, 
which allows for comparing types: 


data _=,;_ {A : Set;} (x : A) : A 7 Set where 
refl : x =, x 


But this is quite unsatisfactory. Apart from the subscript “;”, this definition 
is essentially the same as the one for =, and we have to prove again for this 
notion of equality all the properties we have already proved for =, by copying 
the proofs and inserting “,” from time to time, which means lots of duplication 
of code. Moreover, we would have to do this once again if we want to compare 
elements of a type whose type is Set2, Set3, and so on. 

In order to solve this problem, Agda allows for defining functions on type 
Type, for every level 2: this is called universe polymorphism. This means that we 
can define functions which can take universe levels as arguments. For instance, 
we can define equality as 


data _=_ {i : Level} {A : Set i} (x : A) : A > Set where 
refl : x =x 


As you can observe, the Agda notation for Type; is Set i. This definition de- 
pends on a universe level i which is implicit and thus automatically inferred, 
and thus allows for comparing elements of types of any level. The actual defi- 
nition of equality in the standard library is this one and we can now finish our 
example with 


NB: 7 (N= 1) 
NB p with coe p @ 
NBp | Q 


Lifting. As another application of universe polymorphism, we can derive in Agda 
the cumulativity of universes: we can construct a function Lift which takes a 
element of Type; and casts it as an element of Type; with 7 > 7, called the 
lifting of the type. Since we do not have access to the order on levels, but 
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can compute the maximum UL of two levels, we actually rather give it the type 
Type; + Type,,,;, which also ensures that the returned level is greater than the 
one given as input. The definition is performed based on the observation that 
the lifted type should have the “same” elements as the original one, which can 
be expressed by the following inductive type: 


data Lift {i} j (A: Set i) : Set (i U j) where 
lift :A7 Lift j A 


8.3 More type constructors 


In this section, we give the rules in order to add many of the usual type construc- 
tors in dependent type theory. All those will be subsumed by inductive types 
introduced in section 8.4: as in Agda, these constructions can be implemented 
as particular inductive types. 


8.3.1 Empty type. For the empty type, or falsity, we add the following two 
constructions to expressions 


en=... | L| bot(e,r e’) 
The type is the type for falsity, which is empty, and the construction 
bot(t,7 + A) 


eliminates a proof t of L in order to construct an element of type A (which 
might depend on t via the variable x). The arrow +> is only a formal notation 
here, and does not mean a function: bot is a formal constructor which takes 
as argument an expression ¢, a variable x and an expression A. However, the 
variable x is bound in A, and could be renamed to any other variable name. In 
Agda, it would correspond to the operation which matches a proof of _ in order 
to produce an A, i.e. something like 


bot : (x: 1) 7A x 
elim () 


The rules are as follows. 


Formation. is a valid type in any valid context: 


[Tr 


ee 
Feist F) 


Introduction. There is no introduction rule, because we do not expect that there 
is a way to prove falsity. 


Elimination. Elimination allows proving anything from falsity: 


Peg yal: T,a2: Lt A: Type 
T+ bot(t,a ++ A): Alt/a] 


(Tw) 
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Computation. No rule. 
Uniqueness. No rule. 


8.3.2 Unit type. For the unit type, or truth, we add the following construc- 
tions to expressions: 


exn=... | 7 |x | top(e,r e’,e”) 
where T is the type for truth, x is the constructor for truth and 
top(t,« > A,u) 


eliminates a proof ¢t of T in order to construct a proof u of A. 


Formation. 
Tr (Tr) 
TET: Type 
Introduction. 
Te < 
fear 
Elimination. 


Pete) T,v:TbLA: Type Tru: Alx/z] 


tT 
TF top(t,aH A,u) : A[t/a] oe 
Computation. 
T,v:TbtLA: Type ales wid ee 
TF top(x,2 4 A,u) =u: A[x/z] . 
Uniqueness. 
Prt: 7 = 
‘see 


In OCaml. The type T corresponds to unit, the constructor * to (), the elimi- 
nator top(t,xz ++ A,u) to 


match t with 
| () -> u 


the computation rule says that 


match () with 
| QO ->u 


evaluates to u, and uniqueness says that () is the only value of type unit. 
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8.3.3 Products. For the product, or conjunction, of two types, we add the 
following constructions to expressions: 


en=... |exe’ | (e,e’) | unpair(e,z e’, (y,z) 4 e”) 


The type A x B is the product of A and B (it is sometimes also written AA B). 
The term (t,u) is the pair of two terms ¢ and u and 


unpair(t, z A, (r,y) + wu) 


eliminates a pair t, extracting its components x and y, in order to construct a 
proof u whose type is A which might depend on ¢ as z. 


Formation. 
[Tt A: Type [Tt B: Type 
TFAx B: Type ie) 
Introduction. 
THt:A Tru:B 
TF (t,u): Ax B a 
Elimination. 


THKt:AxB 
T,z:Ax BEC: Type T,v:A,y:BEu: C{(a,y)/z] 
TF unpair(t, z4 C, (x,y) + u) : C[t/z] 


(XE) 


Computation. 
TKt:A 
Tru:B T,z:Ax BEC: Type Ta: A,yy: Bevo: Cla, y)/z] 


TF unpair((t, u), z+ C, (x,y) 6 v) = u[t/x,u/y] : Cl(t, u)/2] Ke) 


Uniqueness. 


TFt:AxB 
[TF unpair(t,z4 Ax B,(z,y) (#,y))=t:AxB 


(xv) 


In OCaml. The type A x B corresponds to a * b, the term (¢,u) to the pair 
(t , u), the eliminator unpair(t, z > A, (2, y) + u) to 


match t with 

| (xX, y) -> U 

the computation rule says that 

match (t , u) with 

| (x,y) -> Vv xy 

evaluates to v t u, and the uniqueness rule says that 


match t with 
| (x,y) > &, y) 


is the same as t. 
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8.3.4 Dependent sums. Dependent sums, or %-types, are a generalization 
the previous notion of product, where the type of the second component might 
depend on the term of the first component. Such a type is written 


X(a:A).B 


and the elements of this type are the pairs (¢,u) consisting of a term t of type 
A and a term u of type B[t/x]. From a logical point of view, this corresponds 
to an existential quantification 


dee A.B 


Namely, a proof of such a proposition consists of a term ¢t in A together with 
a proof that B(t) is satisfied. Formally, we add the following constructions to 
expressions: 


e:=... | U(x: A).B | (e,e’) | unpair(e, x4 e’, (y,z) 4 e”) 


Formation. 
[Tt A: Type T,2:At B: Type 


Tt X(a@: A).B: Type 


(Up) 


Introduction. 


T,z:AtB: Type [TkKt:A TF u: Bit/a] 
TF (t,u) : U(a@: A).B 


(1) 


Elimination. 
TFt:d(a: A).B 
T,z:U(a: A).BEC: Type Ta: A,y: BEu: C(x, y)/z] 
TF unpair(t, z 4 C, (x,y) Hu): Clt/z] 


(Hr) 


Computation. 
[TFt:A Tu: Bit/a] 
T,z: (a: A).BEC: Type T,v:A,y: Bev: Cl(a, y)/z] 
TF unpair((t,u), 2 C, (a, y) 6 v) = v[t/ax,u/y] : Cl(t, u)/2] 


(Ic) 


Uniqueness. 


TFt:d(a: A).B 
[TF unpair(t, z U(a#: A).B, (x,y) > (a, y)) =t: U(x: A).B 


(II) 
Pairs. A pair A x B is a particular case of a H-type H(a : A).B which is not 
dependent, i.e. « ¢ FV(B). In other words, by setting 

Asepe 2 US 2Ae 


where _ is a variable which never occurs in B, we recover the rules previously 
given for products. 
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It might be puzzling at first that a product would correspond to a sum, but 
one should recall that this is actually already the case for natural numbers 


m-1 


mxn= ) nm 
1=0 


Here, a dependent sum would rather correspond to summing a finite family 
(ni)o<icm of natural numbers: 


and a product is a particular case of this where the family is constant (i.e. nj = n 
for every index 1). 


8.3.5 Coproducts. For coproducts, we add the following constructions to ex- 
pressions: 


e:=... [ete | ele) | ele case(rZ,e > Yy,e ze Ke 
/ i i: 3 / y / N Wt 


The type A+B is the coproduct of A and B, which logically corresponds to their 
disjunction. The elements of this type are either a term t of A, written 1? (t), or 
a term u of B, written sA(u), and the eliminator case(t, z+ C,r > u,y 4 v) 
eliminates ¢ to construct a term of type C (which might depend on t as x) by 
considering whether it is of the first or the second form, in which case u or v is 
returned. 


Formation. 
[Tt A: Type [Tt B: Type 


[TbHA+B: Type 


(+F) 


Introduction. 


TFt:A [Tt B: Type 
TrsP(t): A+B 


TEA: Type TFt:B 
(+1) (+1) 


Tt iAt): A+B 


Elimination. 
TFt:A+B T,z:A+ BEC: Type 
Tc: AE u: CLP (2)/z] Ty: Bev: CLA(y)/z] 
TF case(t,z Cir u,y v): Clt/z] 


(+8) 


Computation. 


TEt:A TE B: Type T,z:AtC: Type 
Ta: Abu: CLP (2)/z] Ty: Bev: CleA(y)/z] 


sil 
Tk case(uP(t),z4 C,2 4 uy v) = ult/2] : ChB (t)/2Z] (Fo) 

[TE A: Type TFt:B T,z:AtLC: Type 
Te: Abu: CLP («)/z] Ty: Bev: CliA(y)/z] (45) 
Fo 


TF case(tA(t), 24 Cra uy v) = v{t/2] : CLA)/z] 
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Uniqueness. 


[Trt:A+B 
TH case(t, z+ A+ Br P(x), yo Ay) =t: A+B 


(+v) 


In OCaml. The type A+ B corresponds to an inductive type of the form 


type ('a , 'b) coprod = 
| Left of ‘a 
| Right of 'b 


P(t) to Left t, eA(¢) to Right t and the eliminator 
case(t, 219 Chao u,y v) 
to 


match t with 
| Left x ->u 
| Right y -> v 


The left computation rule says that 


match Left t with 
| Left x -> ux 
| Right y -> vy 


reduces to u x (and similarly for the right one) and the uniqueness rule says 
that 


match t with 
| Left x -> Left x 
| Right y -> Right y 


is the same as t. 


In Agda. The standard notation for + is w and the notations for 4, and 1, are 
respectively inj; and inj2, see section 6.5.6. 


8.3.6 Booleans. For booleans, we add the following constructions to expres- 
sions: 
e:=... | Bool | 1]|0| ite(e,r e’,e”,e”) 


where Bool is the type of booleans, 1 and 0 are true and false respectively and 
ite(t,2 4 A, u,v) is conditional which returns u or v depending on whether t is 
true or false. 


Formation. 
Tr 
—_—_—_—_—- (Boolr) 
TF Bool: Type 
Introduction. 
Te Th 
(Bool) (Bool?) 


TF 1: Bool TF 0: Bool 


CHAPTER 8. DEPENDENT TYPE THEORY 377 


Elimination. 
TF t: Bool 
T,z:Boolt A: Type Teu: Aft/a] TF v: A[0/a] 
- (Boolg) 
TF ite(t,a+ A,u,v): A[t/a] 
Computation. 
T,z:Boolt A: Type TF u: Afl/a] Tv: A[0/a] 7 
; (Boole) 
TF ite(1,aw A,u,v) =v: A[1/z] 
[T,a2:Boolt A: Type Tu: A[l/a] TF uv: A[0/a] 
- (Bool?) 
TF ite(O,aH A,u,v) =u: Al0/z] 


Uniqueness. 
TF t: Bool 


Tk ite(t,z ++ Bool, 1,0) =¢: Bool 


(Boolv) 


In OCaml. Bool corresponds to the type bool, 1 and 0 correspond to true and 
false respectively and the eliminator ite(z, A +> u,v,t) corresponds to 


if t then u else v 
The computation rule says that 
if true then u else v 
reduces to u and that 
if false then u else v 
reduces to v, and the uniqueness rule says that 
if t then true else false 
is the same as t. 
8.3.7 Natural numbers. For natural numbers, we add the following construc- 
tions to expressions: 
e:=... | Nat | Z| S(e) | rec(e,r He, e" yz e”) 


where Nat is the type of natural numbers, Z is zero, S(t) is the successor of t 
and rec(z, A +> u,2, yu + t) is the induction principle on ¢: u is the base case 
and t is the inductive case. 


Formation. 
Te ean 
T+ Nat : Type Nate) 
Introduction. 
Th Tet: Nat 
—____— (Nat?) —________ (Nat) 
[TE Z: Nat TF S(t) : Nat 


CHAPTER 8. DEPENDENT TYPE THEORY 378 


Elimination. 


Tet: Nat T,a2: Natt A: Type 
Thu: A[Z /a] T,a:Nat,y: Abu: A[S(2)/a] 


Nat 
Tr rec(t,2+> A,u, ry > v): Alt/a] iste) 
Computation. 
T,a2:Natt A: Type 
Thu: A[Z /a] T,v:Nat,y: AF vu: A[S(2)/a] P 
(Nata) 


TF rec(Z,cH A,u,ry v) =u: A[Z /a] 


[Tt ¢: Nat T,a:Natt A: Type TFu: A[Z/a} T,a:Nat,y: Ab vu: A[S(z)/a] 


s 
TF rec(S(t),cH A,u, cy v) = v{t/x,rec(t,x 4 A,u, xy > v)/y] : A[S(t)/a] mre 


Uniqueness. 


[Tt t: Nat 
Tk rec(t,2 ++ Nat, Z, xy S(y)) =t: Nat 


(Natu) 


In OCaml. The type Nat corresponds to the type 


type nat = 
| Z 
| S of nat 


where the constructors Z and S respectively correspond to Z and S, and the 
eliminator rec(t,z +> A,u,xy > v) to 


let rec ind t = 
match t with 
| Z-> u 
| S x -> let y = ind x inv 


The computation rule says that 
ind Z 

reduces to u and 

ind (S t) 


reduces v where x has been replaced by t and y by ind t, and the uniqueness 
rule says that 


let rec ind t = 
match t with 
|} Z->2Z 
| Sx -> let y= indxinSy 


is the identity function. 
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8.3.8 Other type constructors. There are two fundamental type construc- 
tions which were not given in this section: inductive types are presented in 
section 8.4 and identity types are presented in section 9.1. 


8.4 Inductive types 


We now present how to formalize general inductive types in type theory. We 
have already seen lots of examples in Agda in sections 6.4 and 6.5. For instance, 
the type of booleans is 


data Bool : Set where 
false : Bool 
true : Bool 


the type of natural numbers is 


data N : Set where 
zero : N 
suc :NAN 


the type of (rooted planar) binary trees is 


data BTree : Set where 
leaf : BTree 
node : BTree ~ BTree ~ BTree 


the type of (rooted planar) trees is 


data Tree : Set where 
nil : Tree 
node : List Tree > Tree 


the type of lists is 


data List (A : Set) : Set where 
nil : List A 
cons : A+ List A 7 List A 


the type of vectors is 


data Vec (A : Set) : N + Set where 
nil: Vec A zero 
cons : {n : N} 7 A 7 Vec A n- Vec A (Suc n) 


the type of finite sets is 


data Fin : N ~ Set where 
zero : {n : N} 7 Fin (suc n) 
suc : {n : N} 7 Fin n > Fin (suc n) 


ans SO On. 
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8.4.1 W-types. The study and formalization of inductive types is notoriously 
difficult and a source of bugs and inconsistencies. For simplicity, we begin by 
studying a very restricted form of inductive types A, called polynomial types or 
W-types, which are defined in such a way that each constructor takes a finite 
number of arguments of type A (we will see that we can also easily generalize 
this to accept arguments whose type do not involve A, the most important 
part of the restriction is that constructors cannot have arguments whose type 
involve A in non-trivial ways, such as having an argument of type A —> A). In 
pseudo-Agda code, such a type would be defined as 


data A : Set where 


C, : AA... AACA 
Co : AA... AACA 
Ch : AW... FAVA 


where A is the inductive type and the C; are the constructors. For instance, the 
type Bool of booleans, Nat of natural numbers and the type BTree of binary 
trees are of this form. In particular, the type BTree has two constructors (leaf 
and node), respectively taking 0 and 2 arguments. 

Such a type is entirely characterized by 


— a number n of constructors, and 


— a function f : {0,...,n —1} > N which to 7 associates the number of 
arguments of the i-th constructor. 


For instance, 
— for booleans, we have n = 2 and f(0) = f(1) =0, 


— for natural numbers, we have n = 2, f(0) =0 and f(1) = 1 (the 0-th and 
1-st constructors are respectively zero and successor), 


— for binary trees, we have n = 2, f(0) =0 and f(1) = 2 (the 0-th and 1-st 
constructors are respectively leaf and node). 


The problem with this data, namely the pair (n, f), is that it does not consist of 
types, and thus does not allow for very natural formalization in terms of typing 
rules. We will see below that it can however be encoded quite naturally into 


types. 


Finite families of types. Suppose that our type theory contains the type L with 0 
element (section 8.3.1), the type T with 1 element (section 8.3.2) and coproducts 
(section 8.3.5). Given a natural number n, we can build a type Fin, with n 
elements as 

Fin, = T+7+...+T7 


the sum being L in the case n = 0. For instance, the type Fin, with 4 elements 
is 


Fing = T+ (T+(T+4+T)) 


A typical element of this type is ¢,(,(4(«))), but we will simplify the notations 
and write 0, 1, 2 and 3 for its elements. In Agda, we have already encountered 
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this type in section 6.4.8. It can be noted that, given a type A, defining a 
function f : Fin, — A precisely amounts to specifying n elements of A, those 
elements being f(0), ..., f(n — 1). 


W-types. Now that we have made the previous remark, we can reformulate our 
definition of inductive types using types with a finite number of elements instead 
of natural numbers. A polynomial type consists of 


— a type A with n elements, for some natural number n, 


— for every element « of type A, a type B(x) with n, elements for some 
natural number n,. 


In other words, it consists of a pair (A, B), with 
A: Type B:A-— Type 


such that A = Fin, for some natural number n and, for every x: A, we have 
B(x) = Fin,, for some natural number nz. It turns out that this restriction to 
the case where A and B(x) are finite types is not very useful in the following, 
so that we will drop it. Having an infinite type A (e.g. natural numbers) corre- 
sponds to having an infinite number of constructors, which seems worrying at 
first, but we will see that it is actually reasonable and useful. 

Given a type A, and a type B which might have «x as free variable, we write 


W(a2: A).B 


for the inductive type defined by this data and call it a W-type. Again, this 
should be thought of as an inductive type with a constructor for each element x 
of type A, this constructor taking as many arguments as there are elements 
in B(x). The constructor W is binding x in B, and a-conversion allows us to 
rename it as we want. 


Example 8.4.1.1. The type of binary trees can be defined by 
A= Fing B(O) = Fino B(1) = Fing 


We now wonder what the terms of type W(x: A).B look like. Consider the 
type of binary trees as defined in Agda above. A typical element of this type is 


node (node leaf (node leaf leaf)) leaf) 


which consists of a the constructor node, applied to two binary trees: the trees 
node leaf (node leaf leaf) and leaf. More generally, an element of the type 
W(a: A).B consists of 


— a constructor, i.e. an element a of A, and 


— nelements of W(a : A).B, where n is the number of elements of the type 
Ba, which are most naturally specified by giving a function Ba > W(a: A).B. 
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W-types in Agda. The previous reformulation directly allows us to define W- 
types in Agda as follows: 


data W (A: Set) (B : A > Set) : Set where 
sup : (a: A) 7 (Barz~WaAB)A7WAB 


The only constructor sup allows constructing an element of W(a : A).B by 
specifying a constructor in A and arguments in the W-type, as explained above. 

For instance, the type of natural numbers has two constructors, so that we 
can take A = Bool where, by convention, false corresponds to the constructor 
zero and true to suc. The first constructor takes zero arguments, which means 
that A false should be an empty type (we can take L) and A true takes one 
argument so that we should take Arg true to be a type with one element (we 
can take T). We can thus define: 


Nat : Set 
Nat = W Bool (A { false + 1 ; true > T }) 


Up to some syntactical heaviness (such as having to write booleans to call the 
constructors), this is precisely the usual inductive type for natural numbers. For 
instance, addition can be programmed “as usual”: 


_t_ : Nat + Nat > Nat 
sup false _+n=n 
sup true x + n = sup true (A { tt + x tt + n }) 


Similarly, the type of binary trees is 


BTree : Set 
BTree = W Bool (A { false + 1 ; true ~ Bool }) 


Encoding into W-types. The class of types which we can handle looks quite 
restricted because the arguments of constructors can only be of the W-type 
itself. It is actually not, thanks to the extra generality brought by the possibility 
of having arbitrary type as A and B(x), and not only finite types. For instance, 
the type of lists 


data List (A : Set) : Set where 
nil : List A 
cons : A> List A 7 List A 


is not obviously a W-type because the constructor cons takes an argument of 
type A, whereas we are trying to define List A, and thus the arguments of 
constructors should have this type. However, instead of thinking of cons as one 
constructor, we can think of it as an infinite family of constructors cons a, one 
for each element a of A, each of which is taking one argument of type List A. In 
this way, it is natural to take Maybe A as the type of constructors where nothing 
corresponds to the constructor nil and just a corresponds to cons a, and we 
define 


List : (A: Set) > Set 
List A = W (Maybe A) (A { nothing + 1 ; (just x) + T }) 
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8.4.2 Rules for W-types. In order to add support for W-types, one should 
add the following constructions to expressions: 


e:=... | W(x: e).e | sup(e,e’) | Wrec(e, 7 e’, ryz 4 e””) 


where W(a : A).B the W-type constructor, sup(t, wu) constructs an element of 
a W-type with ¢ as constructor and u as function specifying arguments, and 
Wrec(t,z > C,xyz +> u) eliminates an element ¢ of a W-type and produces an 
element of type C. 


Formation. 
T,2:At B: Type 
(Wr) 
[TF W(a2: A).B: Type 
Introduction. 
TFt:A TF u: Bit/z] > W(a: A).B 
(W1) 
TF sup(t,u) : W(x: A).B 

Elimination. 


TFt:W(a: A).B T,a:W(a:A).BEC: Type 
T,a:A,y: Bo W(a: A).B,z: I (w: B).C[(yw)/a] F wu: Clsup(a, y)/2] 
TF Wrec(t, 2 C,ryz wu): C[t/a] 


(We) 


Computation. 


Trt:A T,x:W(a: A).BEC: Type Thu: Bit/z] > W(«: A).B 
T,v:A,y:B—- W(a: A).B,z: Iw: B).Cl(yw)/a] F uv : Clsup(z, y)/2] 
T- Wrec(sup(t, u), c+ C,ayz > v) = v[t/x,u/y, Aw. Wrec(uw, «> C, xyz v)/z] : C[sup(t, u)/z] 


(We) 


Uniqueness. This is not usually considered and requires function extensionality. 


8.4.3 More inductive types. W-types are very fine if you want to perform 
a clean and easy implementation of inductive types, or want to study metathe- 
oretic properties of types. In practice, proof assistants have more involved 
implementations of inductive types. One reason is user-friendliness: we want 
to be able to give nice names for constructors, have a nice syntax for pattern 
matching, generate pattern-matching cases automatically, etc. Also, we do not 
want the user to have to explicitly encode his types into W-types, and more 
generally we want to implement extensions of W-types. The interested reader is 
advised to look at good descriptions of actual inductive types in Agda [Nor07], 
in Coq [PM93] or theory [Dyb94]. We list below some common extensions of 
inductive types. 


Indexed W-types. A first generalization of the notion of W-type is the support 
for indices. For instance, the type of finite sets is defined as 


data Fin : N + Set where 
zero : {n : N} 7 Fin (suc n) 
suc : {n : N} 7 Fin n ~ Fin (suc n) 
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so that Fin nis a type with n elements. Here, the type takes a natural number n 
as argument, and various values for this argument are needed for constructors, 
e.g. suc needs an argument of type Fin n to produce a Fin (n+1). 

The definition of W-types can be modified in order to account for indices as 
follows. We only give here the implementation in Agda: 


data W (I : Set) (A: I 4 Set) (B: (i: I) 4+Ai 741-4 Set) : I> Set 
where 
sup: (i: I) (€@: Ai) 7(°(G:1) 7*BiajrwiIABjJ7WIABi 


In this type, I is the type for indices, A i is the type indicating the constructors 
with index i, and B a j indicates the number of arguments of index j of the 
constructor a. 


Example 8.4.3.1. For instance, in the case of Fin, 
— Lis the type of natural numbers, 


— A @is the empty type 1 (there is no constructor for Fin @) and, for i > 0, 
A i is the type Bool with two elements (there are two constructors for 
Fin i: respectively zero and suc), 


— for indices i and Jj, 


— the constructor zero of type Fin j takes zero argument of type Fin i 


— the constructor suc of type Fin j takes one argument of type Fin i 
when suc i is j, and zero argument otherwise, 


which determines the types B i a j. 
We thus define the type A as 


A: No Set 
A zero =_l 
A (suc n) = Bool 


the type B as 


B: (n: N) 7An-N 2 Set 
B (suc n) false m= 1 

B (suc n) true m with n 2m 
B (suc n) true m | yes _ =T 
B (suc n) truem|no _=L1 


and finally, the type of finite sets as 


Fin : N 7 Set 
Finn=WNABnNO 


Exercise 8.4.3.2. Define the types Vec An of vectors of length n containing 
elements of type A using indexed W-types. 
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Mutually inductive types. One might want to define two inductive types which 
mutually depend on each other. For instance, trees and forest can be defined in 
a mutually inductive fashion as follows: 


data Tree : Set 
data Forest : Set 


data Tree where 
leaf : Tree 
node : Forest ~ Tree 


data Forest where 
nil : Forest 
cons : Tree 7+ Forest ~ Forest 


A tree takes a forest as argument and a forest is a list of trees (although we 
do not use the inductive type for lists here and define a new one adapted to 
forests). 


Nested inductive types. One might want to define inductive types in which argu- 
ments are other inductive types applied to the type itself. For instance, trees can 
also be defined as nodes taking lists of trees as argument, lists being themselves 
defined as an inductive types: 


open import Data.List 


data Tree : Set where 
nil : Tree 
node : List Tree > Tree 


Inductive-inductive types. One might want to define both 
— an inductive type A and 
— a predicate on A (i.e. a function A > Type) 


whose definitions mutually depend on each other. For instance, the type of 
sorted lists can be defined along the predicate __<*_ (where x <* 1 means that x 
is below every element of the list 1, see section 6.7.2) as follows. In Agda, we 
first have to declare the type of the two definitions by 


data SortedList : Set 
data _<*_ : N > SortedList 7 Set 


and we can then define both types by mutual induction by 


data SortedList where 
empty : SortedList 
cons : (x : N) (1: SortedList) (le : x <* 1) + SortedList 


data _<*x_ where 
S$x-empty : {x : N} 7 x S* empty 
$x-cons : {x y : N} {1 : SortedList} + 
x $< y 7 (le: y <* 1) + x <* (cons y 1 le) 
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see figure 6.3 for an application of those definitions. Such types are called 
inductive-inductive types [FS12]. 


Coinductive types. Inductive types are defined as a smallest fixpoint, see sec- 
tion 1.3.3. For instance, the type of natural numbers is the smallest type con- 
taining zero and closed under successor. It is also possible to consider greatest 
fixpoints, and the resulting types are called coinductive types. 


8.4.4 The positivity condition. When adding more general forms of induc- 
tive types, one should be very careful. Adding seemingly useful or natural 
inductive types can make the system inconsistent. 


Inconsistent inductive types. As an illustration, consider inductive types where 
the arguments of constructors are types built from basic types, the inductive 
type we are defining, and arrows. For instance, with this formalism, the type of 
binary trees could be implemented in Agda as 


data BTree : Set where 
leaf : BTree 
node : (Bool ~ BTree) ~ BTree 


where the argument of the node constructor is Bool + BTree which is an arrow 
from a basic type (Bool) to the inductively defined type (BTree): given a function 
f of this type, f false indicates the first child and f true indicates the second 
child of the node. 

Such inductive types also allow for a very natural implementation of A-terms. 
Namely, since Agda already implements \-calculus (a-conversion, $-reduction, 
etc.), we would like to use this instead of explicitly redefining those. One way 
to do this is to observe that the only thing we can do with an abstraction Ax.t is 
to 6-reduce it, and therefore implement it as the function which to a A-term u 
associates the term t[u/z], this is normalization by evaluation which is detailed 
in section 3.5.2. This suggests implementing A-terms as the type 


data Term : Set where 
abs : (Term + Term) > Term 


(we should also add a constructor for variables, which we did not do here since 
it will play no role in the following explanation) and application as 


app : Term + Term + Term 
app (abs f) t=ft 


However, remembering the course about A-calculus in section 3.2.6, we start 
feeling bad because we remember that we can define a looping A-term as 


loop : Term 
loop = app © 


where w is defined as 


: Term 


a) 
® = abs (A x > app x x) 
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which contradicts the postulate that all terms should be terminating in Agda. 
Indeed, if we consider the small variation where we define terms 


data Term : Set where 
abs : (Term + 1) > Term 


then app has type Term + Term 7 1 and loop is a proof of L, i.e. our logic is 
inconsistent! 
The proof can further be simplified by defining 


data Bad : Set where 
bad : (Bad > L) 7 Bad 


(we are simply giving another name to the new Term here), which is now thought 
of as a type equivalent to its own negation, thus allowing to prove L. Namely, 
we can show the negation of this type by 


not-Bad : Bad > L 
not-Bad (bad f) = f (bad f) 


we construct a proof of the type 


is-Bad : Bad 
is-Bad = bad not-Bad 


and thus conclude to an inconsistency: 


absurd : L 
absurd = not-Bad is-Bad 


The positivity condition. In practice, when defining the type Bad in Agda, we 
get an error message stating that 


Bad is not strictly positive, because it occurs to the left of an 
arrow in the type of the constructor bad in the definition of Bad. 


This message indicates that our type is rejected, thus preventing the logic from 
being inconsistent, because it does not satisfy the “strict positivity condition” 
explained below. In order to test the above examples, you can however disable 
this check by writing 


{-# NO_POSITIVITY_CHECK #-} 


just before the definition of the type. 

In order to build intuition, first consider traditional functions between sets. 
We write A => B for the set of all functions from a set A to a set B. Given sets 
A, B and B’, it can be noted that 


Bee implies (A> B)C(A= B’) 


Namely, given a function f : A — B, the image of every element of A is an 
element of B and thus of B’, i.e. f is a function from A to B’. However, on the 
left of arrows, the situation is reversed: for sets A, A’ and B, we rather have 


ACA’ implies (A => B)2D(A'=>B) 
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Namely, any function defined for every element of A’ is in particular defined 
for every element of A. Because of this behavior, the arrow types are said to 
be covariant in B and contravariant in A; we also say that A varies negatively 
and B varies positively in A => B. Traditionally, inductive types are obtained by 
“adding elements” to the type. For instance, natural numbers contain zero and 
for every natural number n, we add a new natural number, its successor. Now, 
if some constructors have negative occurrences of the inductively defined type, 
when adding more elements we should also remove some elements, because the 
constructor is contravariant, and the meaning of the inductively defined type is 
not clear at all. In terms of the formalization described in section 1.3.3, this 
means that the function induced by the description of the inductive type might 
not be increasing, so that we have no guarantee that it should have a smallest 
fixpoint. 

The polarity (positive or negative) of a type can be defined as follows. For 
simplicity, we consider types of the form 


A,B:=X|A>B 


consisting either of a variable or an arrow. Given a type A, the polarity of a 
type which is a subterm of A, is defined by induction on A by 


— the polarity of A is positive, 
— inatype B > C, the polarity of C is the same as the polarity of B > C, 


—in a type B — C, the polarity of B is the opposite of the polarity 
of BoC. 


In other terms, the polarity at toplevel is positive, stays the same when we go to 
the right of an arrow and changes when we go to the left of an arrow. An Agda 
formalization of this is given in figure 8.1. A type is strictly positive when it is 
positive and we did not encounter negative types when computing the polarity. 


Example 8.4.4.1. For instance, in the type 


A> ((B>C) > (D- E)) 


the types A, C and D are negative and B and E is positive. The syntactic tree 
of the type can be written as follows 


y+ 
ye” Oe 
AT =r 
we 
= =r 
ae [SS 
Br CT D- Et 


where + or — indicate the polarity of subtrees (positive or negative). The type E 
is strictly positive, but the type B is not strictly positive, because we computed 
its polarity by 


A-— ((B > C) > (D - E)) is positive, thus 
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-- Polarities 

data Polarity : Set where 
pos : Polarity 
neg : Polarity 


-- Opposite of a polarity 
op : Polarity + Polarity 
op pos = neg 
op neg = pos 


-- Types 
data Type : Set where 
var : N > Type 
arr : Type ~ Type ~ Type 


-- Subterm relation on types 
data _<_ : Type ~ Type ~ Set where 
top : {A : Type} 7A <A 


left : {A A' B: Type} 7A <A' 7A <arr A' B 
right : {A BB' : Type} + B < B' +B < arr AB' 


-- Polarity of a type A in a type B 
polarity : {AB : Type} 7 A < B > Polarity 
polarity top = pos 

polarity (left p) = op (polarity p) 
polarity (right p) = polarity p 


Figure 8.1: Polarities of types in Agda. 
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(B > C) > (D — E) is positive, thus 
—~ B-> Cis negative, thus 
— B is positive, 


and we encountered negative types. 


Agda (and most other proof assistants such as Coq) implement the following 
restriction on inductive types: given a constructor of an inductive type A, if A 
occurs in the argument of an inductive type, then it must do so strictly posi- 
tively. For instance, Bad above is rejected because the constructor bad takes one 
argument of type Bad ~ 1, where Bad occurs negatively. The above counter- 
example explains why we must forbid negative occurrences. The reason why 
we must further restrict to strictly positive occurrences is explained for Coq 
in [CP88]; its usefulness in Agda is not so clear [Coq13], which is why we did 
not provide a counter-example. 


8.4.5 Disjointedness and injectivity of constructors. We present here two 
important properties of inductive type constructors in Agda related to equality. 


Disjointedness of constructors. Constructors are disjoint, meaning that values 
made using two different constructors are necessarily different. For instance, 
over natural numbers, zero cannot be equal to the successor of some number: 


zero-suc : {n : N} 7 zero = suc n> L 
zero-suc () 


Here, the empty pattern () means that Agda should check by himself that the 
case where zero is equal to suc n cannot happen, which it does thanks to the 
disjointedness assumption. 


Injectivity of constructors. Constructors are injective, meaning that if two con- 
structed values are equal then the arguments are equal. For instance: 


suc-injective : {mn : N} + suc m= sucn7men 
suc-injective refl = refl 


(this could also be shown directly using cong). 
Injectivity of type constructors. Type constructors are not injective by default. 
For instance, the following does not typecheck: 


list-inj : {AB : Set} + List A= List B7 A =B 
list-inj refl = refl 


We can however explicitly ask Agda to make type constructors injective, by 
adding the following pragma at the beginning of the file: 


{-# OPTIONS --injective-type-constructors #-} 


The reason why it is not enabled by default is that it makes the system incon- 
sistent together with the excluded middle, thus preventing from safely working 
in classical logic. A counter-example was found by Hur based on the following 
observation [Hur10]. We can define an inductive data type of the form 
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data I : (Set + Set) + Set where 


(the constructors will not matter, so we might as well choose to have none). The 
injectivity of the type constructor I amounts to having an injection of Set + Set 
into Set, which is excluded by a diagonal argument a la Cantor, appendix A.4. 
Namely, with injective type constructors, we can show 


inj : {x x' : Set + Set} > Ix =I x'7x=z=x 
inj refl = refl 


In order to use the Cantor diagonal argument formalized in appendix A.4.2, we 
have to show that Set contains two distinct types, say T and L, 


T#L : 3 (T = 1) 
T#L p = subst (A A> A) p tt 


and suppose that the law of excluded middle holds 
postulate lem : (A: Set;) + Dec A 
We finally conclude 


absurd : 1 
absurd = Cantor.no-injection T#L lem I inj 


8.5 Implementing type theory 


We now explain how to implement a typechecker in dependent type theory in a 
reasonably efficient and principled way. We chose to implement a type theory 
with II-types, natural numbers and identity types in order to show most of the 
principles needed in order to implement a typechecker. Identity types will be 
presented in section 9.1, and you can simply ignore them if you have not yet 
read this part of the book. We could have more generally implemented inductive 
types (or, at least, W-types) in a similar way [CKNT09], but felt that the 
code would be more readable when specialized to natural numbers and identity 
types. The version given here is a variant of the standard implementations for 
dependent types [Coq96, GL02, CKNT09, LMS10, Bau12]. 

The basic idea is to implement a bidirectional typechecking algorithm, simi- 
lar to the one we already presented for simply-typed A-calculus in section 4.4.5: 
we try to check that a term has a given type when we have a candidate for 
the type, otherwise we try to infer the type of the term. The reason for this is 
that we generally declare the type of functions before defining them, but do not 
want to annotate each A-abstraction with the type of the variable. This is for 
instance the way things are in Agda. There is a subtlety though: when com- 
paring expressions (terms or types), we should do so modulo a@-convertibility. 
As detailed in section 4.2.4, by the confluence and termination of the calculus, 
we can decide whether two expressions t and u are convertible, by computing 
their normal forms (i.e. 6-reducing them as much as we can) and checking the 
resulting terms for a-convertibility. This means that we should choose a way to 
implement (-reduction among the ones presented in section 3.5. The normaliza- 
tion by evaluation technique (section 3.5.2) is the most suitable for us: it is quite 
easy to implement because it relies on the implementation of the G-reduction of 
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the programming language (OCaml in our case). Moreover, it allows for an effi- 
cient implementation of convertibility: instead of fully performing (-reduction, 
we can compute weak head normal forms, so that we can potentially detect 
when two terms are not equal without fully reducing them. 


8.5.1 Expressions. We begin by formally defining expressions as 


type expr = 
| Var of string (** a variable *) 
| Abs of string * expr (** a lambda-abstraction *) 
| App of expr * expr (** an application *) 
| Pi of string * expr * expr (xx a Pi-type *) 
| Type of int (** a universe *) 
| Nat (** type of natural numbers *) 
| Zero (** zero *) 
| Succ of expr (** successor *) 
| Ind of expr * expr * expr * expr (** induction *) 
| Id of expr * expr * expr (xx identity type *) 
| Refl of expr («xx reflexivity *) 


J of expr * expr * expr * expr * expr * expr (** id elim *) 
An expression is thus either a variable, a \-abstraction 
Au.t written Abs(x, t) 
an application of an expression to another, a II-type 
II(a : a).b written Pi(x, a, b) 


a universe of given level, the type of natural numbers, zero, the successor of a 
natural number, the induction principle 


rec(n, xt A,z,mr s) 


written 
Ind(n, Abs(x, a), z, Abs(m, Abs(r, s))) 


(note that we use abstractions as arguments of Ind in order to avoid having 
to handle a-conversion here, and only take care of it for abstractions, see sec- 
tion 4.3.3) and identity type 


Ida(t, wu) written Id(a, t, u) 


a reflexivity proof 
refl(t) written Ref1l(t) 


or a J eliminator 
J(e,ryers Ayah r) 


written 


Jca, Abs(x, Abs(y, Abs(e, a))), Abs(x, r), t, u, e) 
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Values. A term will evaluate to a value which is, by definition, a term which 
does not reduce anymore. The type corresponding to values is 


type value = 
| VAbs of (value -> value) (** a lambda-abstraction *) 
| VPi of value * (value -> value) (** a Pi-type *) 
| VType of int (** a universe *) 
| VNat (** type of natural numbers *) 
| VZero (** zero *) 
| VSucc of value (** successor *) 
| VId of value * value * value (xx identity type *) 
| VRefl of value (xx reflexivity *) 
| VNeutral of neutral (** a neutral value *) 


which roughly corresponds to the definition of expressions, with a few notable 
differences, as we now explain. For abstractions (VAbs), the body is not yet 
evaluated, because we are computing weak head normal forms: instead, we have 
a function which given an argument, will compute the normal form of the body 
with the argument substituted as expected. Similarly, a I-type H(z : A).B 
is stored in VPi as the type A and the function \xv4.B, which provides the 
type B given the argument of type A. The last case corresponds to neutral 
values: those are expressions in which the computation is not fully performed, 
but is stuck because we do not know the value for some variable. For instance, 
given a variable x and a term ¢, the term xt is a value: in order to evaluate 
this application, we would need to know the value for xz, which should be a 
A-abstraction. Neutral values are defined by the type 


and neutral = 
| NVar of string 
| NApp of neutral * value 
| NInd of neutral * value * value * value 
| NJ. of value * value * value * value * value * neutral 
and thus consist either of a variable, or a neutral value applied to a value 
(e.g. xt), or an induction on a neutral value (e.g. an induction on a variable) or 
an elimination of a neutral proof of identity. 


8.5.2 Evaluation. We can then easily write a function which applies a value t 
to another value u. In the case ¢ is an abstraction, we apply it to u. Otherwise, 
if we assume that the terms are suitably typed, t has to be a neutral value 
(e.g. we cannot apply a natural number to some other term), in which case the 
result is still a neutral value: 


let vapp u v = 
match u with 
| VAbs f -> fv 
| VNeutral t -> VNeutral (NApp (t, v)) 
| _ -> assert false 


Thanks to this helper function, we can write a function eval which evaluates 
an expression t to a value. The function also takes an environment env, which 
is a list of pairs associating to a free variable its value, in the case it is known. 
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let rec eval env t = 
match t with 
| Var x -> 
(try List.assoc x env with Not_found -> VNeutral (NVar x)) 
Abs (x, e) -> VAbs (fun v -> eval ((x,v)::env) e) 
App (e1, e2) -> vapp (eval env e1) (eval env e2) 
Pi (x, a, e) -> VPi (eval env a, fun v -> eval ((x,v)::env) e) 


Type i -> VType i 

Nat -> VNat 

Zero -> VZero 

Succ e -> VSucc (eval env e) 


Ind (n, a, zZ, S) -> 
let n = eval env n in 
let a = eval env a in 
let z = eval env z in 
let s = eval env s in 
let rec f = function 
| VZero -> Zz 
| VSucc n -> vapp (vapp s n) (f n) 
| VNeutral n -> VNeutral (NInd (n, a, z, $)) 
| -> assert false 


fon 
| Id (a, t, u) -> VId (eval env a, eval env t, eval env u) 
| Refl t -> VRefl (eval env t) 
| J (a, P, r, t, u, e) -> 


¢ 
match eval env e with 
| VRefl _ -> eval env r 
| VNeutral e -> 
VNeutral (NJ (eval env a, eval env p, eval env r, 
eval env t, eval env u, e)) 
| _ -> assert false 
) 


As explained above, when evaluating a function (Abs), we return a function 
which will return the value corresponding to the body, provided the argument, 
which is stored in the environment. We use the function vapp in order to evalu- 
ate applications (App). For constructors corresponding to types and introduction 
rules, the function simply consists in evaluating all the arguments of the con- 
structor. For the constructors corresponding to elimination rules, we evaluate 
the argument we are eliminating and then evaluate the construction accordingly. 
For instance, for induction (Ind), we evaluate the natural number n on which 
the induction is applied and compute the result of the induction accordingly, 
depending on whether the result is zero, a successor, or a variable. 


8.5.3 Convertibility. Our goal is now to decide the convertibility of expres- 
sions. As explained above, this is basically performed by evaluating expressions 
to values and then comparing the resulting values for equality (we call veq 
the function which compares two values). However, since values may contain 
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functions (under the VAbs constructors), we first have to implement a read- 
back function which will convert a value into an expression, following the same 
techniques as in section 3.5.2. 


Readback. The readback function takes as arguments an natural number k (to 
generate fresh variables) and a value, and produces an expression: 


let rec readback k v = 
let rec neutral k = function 
| NVar x -> 
Var x 
| NApp (t, u) -> 
App (neutral k t, readback k u) 
| NInd (n, a, z, s) -> 
Ind (neutral k n, readback k a, readback k z, readback k s) 
| NJ (a, p, r, t, u, e) -> 
J (readback k a, readback k p, readback k r, 
readback k t, readback k u, neutral k e) 
in 
match v with 
| VAbs f -> 
let x = fresh k in 
Abs (x, readback (k+1) (f (var x))) 
| VPi (a, b) -> 
let x = fresh k in 
Pi (x, readback k a, readback (k+1) (b (var x))) 
VType i -> Type i 
VNat -> Nat 
VZero -> Zero 
VSucc n -> Succ (readback k n) 
VId (a, t, u) -> 
Id (readback k a, readback k t, readback k u) 
| VRefl t -> Refl (readback k t) 
| VNeutral t -> neutral k t 


This function essentially consists in translating the constructors of value into 
the corresponding constructors of expr. The only subtlety can be found in the 
case of VAbs (and VPi which is similar). In order to generate the expression 
corresponding to an abstraction VAbs f, we apply f to a fresh variable, whose 
name is generated thanks to the natural number k. Namely, we use the following 
helper function to construct a “fresh” variable name with index k: 


let fresh k = "x@"*string_of_int k 


(we suppose that the user will never input a variable name containing the char- 
acter @). Above, the function var is a shorthand to construct a variable with 
given name: 


let var x = VNeutral (NVar x) 


Equality of values. Because of the way the readback function is implemented, by 
canonically generating variable names when needed using an index k, two values 
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will be a-convertible when they have the same readback. We can therefore test 
the equality of two values t and u with the following function: 


let veq k t u = readback k t = readback k u 


More efficient equality. The above test for equality of values is not very eff- 
cient: it essentially requires evaluating the whole term, which can be very costly, 
whereas this is unnecessary when the two terms are not equal. For instance, 
the two terms VAbs f and VZero are not equal, and there is no need to proceed 
to the evaluation of f in order to determine this. The following refined test for 
equality takes this into account: it combines both readback and comparison, 
and amounts to computing the weak head normal forms of the two terms (see 
section 3.5.1) in order to compare them, and only evaluating under abstractions 
if the two weak head normal forms are abstractions. 


let rec veq k tu = 
let rec neq k t u = 
match t, u with 
| NVar x, NVar y -> x = y 
| NApp (t, v), NApp (t', v') -> 
neq k t t' && veq k v v' 
| NInd (n, a, z, s), NInd (n', a', z', s') -> 
neq k nn' && veq k a a' && veq k z z' && veq k s s' 
| NJ (a, p, r, t, u, e), NJ (a', p', r', t', u', e') -> 
veq kk aa' & veqk pp' &  veqk rr' & 
veq k t t' && veq k uu' && neq k e e' 
| _, _ -> false 
in 
match t, u with 
| VAbs f, VAbs g -> 
let x = var (fresh k) in 
veq (k+1) (f x) (g x) 
| VPi (a, b), VPi (a', b') -> 
let x = var (fresh k) in 
veq k a a' && veq (k+1) (b x) (b' x) 
VType i, VIype j -> i = j 
VNeutral t, VNeutral u -> neq k t u 
VNat, VNat -> true 
VZero, VZero -> true 
VSucc t, VSucc u -> veq k tu 
VId (a, t, u), VId (a', t', u') -> 
veq k a a' && veq k t t' && veg k u u' 
| VRefl t, VRefl u -> veq k t u 
| _, _ -> false 


The helper function neq compares neutral values for equality. 


Exercise 8.5.3.1. Modify this function in order to compare values for 7-equi- 
valence. You should start by adding a new argument to the function which is 
the common type of the two values. See also exercise 7.5.3.1. 


CHAPTER 8. DEPENDENT TYPE THEORY 397 


8.5.4 Typechecking. Finally, we can implement a type inference function 
infer as follows. We follow here the principles of bidirectional typechecking 
and define it at the same time (by mutual recursion) as one performing type 
checking, i.e. this is quite similar to the developments of section 4.4.5. The type 
inference function takes as argument an index k for generating fresh variables 
as above, a typing environment tenv associating to variable names a type, an 
environment env associating to variable names a value, and a term t whose type 
we would like to determine. This function is essentially, a translation to OCaml 
of the natural deduction rules of sections 8.1, 8.3 and 9.1.3: 


let rec infer k tenv env t = 
match t with 
| Var x -> 
¢ 
try List.assoc x tenv 
with Not_found -> raise (Unbound_variable x) 


) 
| App (t, u) -> 
( 
match infer k tenv env t with 
| VPi (a, b) -> 
check k tenv env u a; 
b (eval env u) 
| _ -> raise Type_error 
) 


| Pi (x, a, b) -> 

let i universe k tenv env a in 

let a = eval env a in 

let j = universe k ((x,a)::tenv) env b in 
VType (max i j) 

Type i -> VType (it1) 

Nat -> VType @ 

Zero -> VNat 


Succ t -> 
check k tenv env t VNat; 
VNat 
| Ind (n, a, z, Ss) -> 
( 


check k tenv env n VNat; 
match eval env a with 
| VPi (VNat, a) -> 
let n = eval env n in 
check k tenv env z (a VZero); 
check k tenv env s 
(VPi (VNat, fun n -> varr (an) (a (VSucc n)))); 
an 
| _ -> raise Type_error 
) 
| Id (a, t, u) -> 
let i = universe k tenv env a in 
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let a = eval env a in 
check k tenv env t a; 
check k tenv env u a; 
VType i 
| Refl t -> 
let a = infer k tenv env t in 
let t = eval env t in 
VId (a, t, t) 
| J (a, p, r, t, u, e) -> 
let i = universe k tenv env a in 
let a = eval env a in 
check k tenv env p 
(VPi (a, fun x -> 
VPi (a, fun y -> varr (VId (a, x, y)) (VType i)))); 
let p = eval env p in 


let p x y e = vapp (vapp (vapp p x) y) e in 
check k tenv env r (VPi (a, fun x -> p x x (VRefl x))); 
check k tenv env t a; 
check k tenv env u a; 
let t = eval env t in 
let u = eval env u in 
check k tenv env e (VId (a, t, u)); 
let e = eval env e in 
ptue 
| Abs _ -> raise Type_error 


This function raises an error Unbound_variable when an undeclared variable 
is used and Type_error when the expression does not typecheck. It uses the 
following helper function to construct an arrow type, as a non-dependent I- 


type: 
let varr a b = VPi (a, fun _ -> b) 


This function is defined by mutual induction with a function which checks that 
an expression is a type and returns its universe level: 


and universe k tenv env t = 
match infer k tenv env t with 
| VType i -> i 
oa -> raise Type_error 


and with a function which checks that a term t has a given type a: 


and check k tenv env t a : unit = 
match t, a with 
| Abs (x, t), VPi (a, b) -> 
let y = var (fresh k) in 
check (k+1) ((x,a)::tenv) ((x,y)::env) t (b y) 
| Refl t, VId (_, u, v) -> 
let t = eval env t in 
if not (veq k t u) then raise Type_error; 
if not (veq k t v) then raise Type_error 
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| t, a -> 
let a' = infer k tenv env t in 
if not (veq k a a') then raise Type_error 


Note that the case where the term is an abstraction Ax.t (constructor Abs) and 
the type is a [-type II(y : A).B (constructor VPi) is subtle: when checking that 
the body t has type B, we do so by after replacing both x and y by a fresh 
variable name. 


8.5.5 Testing. In order to test our implementation, we can check that the 
addition has the type A = Nat — Nat —> Nat: 


let © = 
let a = varr VNat (varr VNat VNat) in 
let t = 
Abs ( 
mn", 
Ind ( 
Var "m", 
Pi ("_", Nat, Pi ("_", Nat, Nat)), 
Abs ("n", Var "n"), 
Abs ("m", 
Abs ("r", 
Abs ("n", Succ (App (Var "r", Var "n"))))) 
) 
) 
in 


check @ [] [] ta 


Of course, it is not reasonable to proceed in this way in order to use the im- 
plementation and one should implement a proper lexer and parser. We do not 
describe this part here since it is out of the scope of this book. 


CHAPTER 9 


Homotopy type theory 


In the introduction of chapter 2, we have motivated the exploration of intu- 
itionistic logic by changing the intuitive meaning we give to types: instead of 
thinking of them as booleans, it is much more satisfactory to consider that they 
should be interpreted as sets, where it makes sense to consider various elements 
of a type. Namely, the boolean interpretation is too limited because when a 
type A is not false (i.e. empty) there is only one reason why this could be: A is 
necessarily true (ie. the set with one element) and in this case there is only one 
proof of A (the only element of the set). In this sense, the boolean interpretation 
does not allow for considering the possibility that a type should admit various 
proofs. Now, if we try to make sense of equality in type theory, we discover 
that the set-theoretic interpretation of logic suffers from the same limitations. 
Namely, in a set, when two elements x and y are equal there is only one reason 
why this could possibly be: this is because x is the same as y. 

This suggests changing once again the semantics we give to types and inter- 
pret them, not as booleans, not as sets, but as spaces. In this interpretation, 
proofs of equality correspond to paths, and we can thus conceive of models 
where there can be various ones. Homotopy type theory is dependent type the- 
ory seen from this point of view, and was introduced by Awodey and Voevodsky 
in the 2000’s. The latter discovered that an additional axiom, called univalence, 
was required for the logic to match the situation in spaces, and homotopy type 
theory is usually understood with this axiom. 

This makes the mathematician happy because he discovers that logic is se- 
cretly all about geometry. We will not dive too far in this direction because 
this would require introducing too much material and this is already wonder- 
fully covered in [Unil3]. This also should make the computer scientist happy, 
because it allows for a clean handling of isomorphic data structures. Namely, it 
often occurs that we have the choice between various isomorphic ways of rep- 
resenting data, for instance lists or arrays, and we would like to automatically 
transfer the properties of one to the other. We will see that univalence allows 
this: two isomorphic types will be equal and we will thus be able to transport 
functions from one to the other. 

The reader interested in learning more about the topic is urged to read the 
foundational book about the topic [Uni13]. The course notes of Altenkirch [AIt19] 
and Escardé [Esc19] are also very helpful. 

We begin by introducing identity types in section 9.1, explain how types 
can be interpreted as spaces in section 9.2, discuss the classification of types as 
n-types in section 9.3, introduce the univalence axiom in section 9.4 and present 
higher inductive types in section 9.5. 
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9.1 Identity types 


9.1.1 Definitional and propositional equality. In type theory, we have two 
notions of equality. 


— The definitional equality states that some terms cannot be distinguished: 
this is the “=” relation in the inference rules, which corresponds to iden- 
tifying terms under (-equivalence (or generalizations of it to terms with 
constructors other than A-abstractions). 


— The propositional equality or identity is a particular type expressing the 
fact that we consider two terms as equal. 


In Agda, there is no notation for definitional equality, because there is simply 
no way to distinguish between two definitionally equal terms. On the other 
hand, t = u expresses propositional equality between two terms t and u: we 
can provide a proof of such a fact and reason about it, but, when using u in 
place of t, we should perform some explicit manipulations (e.g. with subst) in 
order to explain to Agda that we can replace one by the other. 


For instance, consider the usual definition of addition, see section 6.4.2: 


_+_:N-7ANAN 


zero +n=n 
suc m + n = suc (m + n) 


The terms zero + nand n are definitionally equal: the second line in the above 
definition explicitly states that this should be the case. For this reason, the two 
can be used interchangeably and, for instance, we can give a vector of length 
zero + n where a vector of length n is expected. In contrast, the terms n + 
zero and n are not definitionally equal (there is no line in the definition of the 
addition which explicitly states that this should be the case), but we can show 
that they are propositionally equal, i.e. n + zero = n, which requires reasoning 
on addition (by induction). This is the reason why the proof of left unitality of 
addition is so simple 


+-zero' : (n : N) 7 zero+n=n 
+-zero' n = refl 


whereas the right unitality is more involved 


+-zero : (n : N) #7 n+ zero=n 
+-zero zero = refl 
+-zero (suc n) = cong suc (+-zero n) 


In this chapter, we mostly focus on propositional equality, leaving the def- 
initional one implicit as it should be, and sometimes simply say equality for 
propositional equality. The propositional equality is also referred to as identity 


and a type t = uas an identity type. 


9.1.2 Propositional equality in Agda. We have already seen in section 6.6 
that the definition of propositional equality is expressed in Agda with the fol- 
lowing inductive type: 
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data _=_ {A : Set} (x : A) : A > Set where 

refl : x =x 
It has only one constructor, refl, which expresses the reflexivity of equality: a 
term is equal to itself. In particular, two definitionally equal terms are proposi- 
tionally so. 


9.1.3 The rules. The rules for propositional equality, or identity types, follow 
from the above definition of equality as an inductive type, but can also be 
given directly, as for other connectives. These were first formulated by Martin- 
L6f [ML75, MLS84]. 

We extend the syntax of expressions with 


eu=... | Id-(e’,e”) | refl(e) | J(e,ryz e', 2’ He”) 
The new constructions are the following: 


— the type Id4(t, u) is called an identity type and expresses the fact that two 
terms t and wu of type A are equal, 


— refl(t) is the reflexivity of t, and 
— J is the eliminator for identities. 


In the following, we will often simply write t = u instead of Id,(t, wu), in accor- 
dance with Agda’s notation for equality types. 


Formation. The formation rule states that we can consider the type of propo- 
sitional equalities, or identities, between any two terms t and wu of the same 
type: 
THt:A Tru:A 
TF Ida(t, u) : Type 


(Idr) 


Introduction. The constructor refl allows proving the reflexivity of equality on 
a given term t: 


Trt:A 
TF refl(t) : Id4(¢, t) 


(Idz) 


Elimination. The eliminator states that in order to prove a property B depend- 
ing on a proof p that two terms t and u of type A equal, it is enough to give a 
proof r of it in the case where p is reflexivity: 
TF} p: Ida(t, u) T,v:A,y:A,z:Ida(z,y)+ B: Type 
T,v: Abr: Bla/a,x/y,refl(x) /z] 
ThE J(p,cyzH B,xrvr): Bit/x,u/y, p/z| 


(Idg) 


Computation. The computation rule expresses the fact that, when we use a 
proof constructed by J in the case where the considered proof of identity is 
reflexivity, we recover the proof r we provided: 
TFt:A T,v:A,y:A,z:Ida(z,y) B: Type 
T,a:Abkr: Bla/x,x/y, refl(x)/z] 
TF J(refl(t), cyz 4 Biro r)=rit/a] : Bit/x,t/y, refl(t)/z] 


(Idc) 
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Uniqueness. The uniqueness rule states that any term ¢ depending on an iden- 

tity z, can be obtained from its restriction to the case where z is the reflexivity, 

by using the J rule: 

T,2:A,y:A,z:Ida(z,y)+ B: Type T,a:A,y:A,z:Ida(z,y) Ft: B 
T,a:A,y:A,z:Ida(a,y) + J(z,cyz 6 B,xh t[a/y,refl(x)/z]) =t: B 


(Idu) 


This uniqueness rule, which was present in Martin-L6f’s original system, is 
debatable. In particular, it implies that the following rule, sometimes called 
equality reflection, which states two propositionally equal terms are definition- 
ally so, is admissible: 

TF p: Ida(t, u) 
TFt=u:A 
Namely, given a type A in a context I’, we deduce, using the uniqueness rule, 


T,a2:A,y:A,z:Ida(z,y) A: Type 
T,v:A,y:A,z:Ida(a,y) Fa: A 
T,v:A,y:A,z:Ida(z,y) J(z,tyzvH Aprva)=a:A 


(Idu) 


Similarly, we can also deduce, in the same context 
J(z,tyzH ArH x) =y 


and thus x = y by transitivity, ie. the following rule is admissible: 


T,v:A,y:A,z:Ida(z,y)-a=y:A 


We finally obtain the equality reflection rule by substituting x for t, y for u 
and z for p. In a similar way, one can show that the rule 


T,a:A,y:A,z:Ida(az,y) F z = refl(x) : Ida(z, y) 


is admissible, i.e. reflexivity is the only possible proof of equality. A type theory 
allowing those rules is called extensional, and has the inconvenient property that 
its typechecking is undecidable [Hof95]. We will thus not postulate this rule in 
the following, and thus consider intensional type theory, for which typechecking 
can be mechanized. We will moreover see that not postulating that reflexivity 
is the only possible proof of equality allows for much richer models. 


In Agda. The eliminator J corresponds to matching a proof of equality with ref1: 


J: {A : Set} {x y : A} (p: x = y) 
(BB: &«y:A)7x = y > Set) 
Cr : (x : A) 7B x x refl) 
7Bxyp 

J {A} {x} {.x} refl Br=rx 


Note that the second line corresponds precisely to the computation rule for 
identity. 

In Agda, for reasons explained above, the uniqueness rule does not hold, but 
the variant expressed with propositional equality instead of definitional equality 
does: 
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J-n : {A : Set} {x y : A} (Pp: x By) 
(CP: (xy: A) (Pp: xX = y) > Set) 
(t: (Ky: A) (Pi: x Fy) F7Pxyp) 7 
JpP(Ax-7txx refl) =txyop 


J-y refl P t = refl 


9.1.4 Leibniz equality. The definition of equality given above is not the first 
one one might think of. Another definition which is perhaps easier to accept 
was proposed by Leibniz [Lei86]. In this context, two things are said to be 


— identitical when they are propositionally equal, 


— indiscernible when a property satisfied by one is necessarily satisfied by 
the other. 


There are two possible implications between those notions. The implication 
identical > indiscernible 


is called the principle of indiscernability of identicals. This is easy to take for 
granted: if two things 7 and y are equal then we should be able to replace an 
occurrence of x by y in every property. In other words, equality should be a 
congruence. The other implication 


indiscernible => identical 


is called the principle of identity of indiscernibles: it states that two things 
satisfying the same properties are the same. This is somewhat of an “interac- 
tive” point of view on the world, considering that in order for two things to be 
distinct, there should be some sort of experiment which allows distinguishing 
between the two. Leibniz postulated that both principles hold, i.e. the two no- 
tions are equivalent. The reference often quoted for the second principle is the 
following [Lei86]: 


il n’est pas vray, que deux substances se ressemblent entierement et 
soyent differentes solo numero 


(which goes on with assertions such as “On peut méme dire que toute substance 
porte en quelque facgon le caractere de la sagesse infinie et de la toute puissance 
de Dieu, et Vimite autant qu’elle en est susceptible.” which are less clear from 
a logical point of view). If we also accept this implication, then we can in fact 
take indiscernability as a definition for equality. This is sometimes called Leibniz 
equality: 

Definition 9.1.4.1 (Leibniz equality). Two things are equal when every property 
satisfied by one is also satisfied by the other. 


We write x = y when z and y are equal according to Leibniz definition, i.e. when 
for every predicate P(z), with a free variable z, we have 


P(x) > Py) (9.1) 


In Agda, this can be formalized in the following way: 
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_=_: {A : Set} + (x y: A) > Set, 
_=_ {A} x y = (P_: A> Set) + (P x 7 P y) 


This relation is clearly reflexive: 


=-refl : {A : Set} {x : A} 7x=x 
=-refl Pp=p 


It is however not obvious that it is symmetric. In fact, our first inclination would 
have be to have taken P(x) = P(y) in the definition (9.1), i.e. an equivalence 
instead of an implication, so that symmetry would be obvious. However, this is 
not necessary, because the converse implication can always be deduced: 


Lemma 9.1.4.2. If «= y then y= 2. 


Proof. Suppose that x = y, and fix a predicate P. Consider the predicate 
Q(z) = (P(z) = P(a)). By definition of x = y, we have Q(x) implies Q(y), 
ie. P(x) > P(x) implies P(y) > P(x). But, P(x) = P(x) is obviously true, 
so that we have P(y) = P(«). Since, this holds for every predicate P, we 
have y= 2. 


In Agda, this can be formalized as follows: 


=-sym : {A : Set} {x y: A} 7x=yrye=x 
=-sym {x = x} eP=e (Az7 (Pz7PxX)) (App) 


Transitivity can be shown in a similar fashion: 


=-trans : {A : Set} {x yz: A} 7x=yry=z7x=z 
=-trans {x = x} ee' P=e' (Az-7 (Px 7P Zz)) (e P) 


Now, the question is how this definition of equality = compares to the propo- 
sitional equality = of the previous section: if the two did not agree then we 
would have to discuss which one is the right one, which looks like a metaphysi- 
cal debate. Fortunately, both of them can be shown to coincide. The fact that 
propositional equality implies Leibniz equality follows immediately by induc- 
tion, since when x = y, we can restrict to the case where x and y are the same 
(and the proof of x = y is reflexivity): 


=-to-= : {A : Set} {x y: A} 7x 2y7xey 
=-to-= refl = =-refl 


and the converse implication can be obtained as the variant of the proof that = 
is symmetric: 


=-to-= : {A :: Set} {x y:A}7x=yrxey 
=-to-= {x = x} e=e (AZ > xX = 2) refl 


More details can be found in [ACD*18]. 
9.1.5 Extensionality of equality. Two things are said to extensionally equal 


when their constituents are equal. We expect that equality coincides with ex- 
tensional equality, and it is in fact so for inductively defined types. 
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Extensional equality on pairs. Two pairs are extensionally equal when their 
members are equal. It is easy to show that two extensionally equal pairs are 
equal: 


x-= : {AB : Set} {x x' : A} {y 


ES OS EY OO a) 
x-= refl refl = refl 


: B} > 
(x' , y') 


n< 


and conversely, two equal pairs are extensionally so: 


=-x : {AB : Set} {x x' : A} {y y' : B} > 
(XV OCS WA OCS KD OS y > 
=-x refl = refl , refl 


Extensional equality on lists. Similarly, two lists are extensionally equal when 
they have the same (i.e. equal) elements. In Agda, this relation can be defined 
inductively by 


data _==_ {A : Set} : (1 1' : List A) 7 Set where 
==-[] : [] == (1 
==-: : {x x' : A} {1 1' : List A} > 
x=xX' a1 se 1' + es 1) = (x's 1’) 


This relation is easily shown to be reflexive by induction 

==-refl : {A : Set} (1 : List A) 71 ==1 

==-refl [] = ==-[] 

==-refl (x : 1) = ==>: refl (==-refl 1) 

from which one can show that equality implies extensional equality: 


s-== : {A : Set} {1 1' : List A} + 1 2£1' 41 ==1' 
=-== {l = 1} refl = ==-refl 1 


Conversely, one can show that two lists with the same head and the same tail 
are equal: 


=-:: : {A : Set} 7 {x x' : A} 7 {1 1' : List A} - 
' : 1" 


x=x' 2]121' ax: 12x 
=-:; refl refl = refl 


from which one can deduce that two extensionally equal lists are equal: 


: {A : Set} {1 l' : List A} 91 == 1' 4121! 


Extensional equality on functions. Similarly again, we declare that two functions 
f and g of type A —> B are extensionally equal when, for every element x of 
type A, we have f(x) = g(x). Clearly, two equal functions are extensionally so 


=-ext : {AB : Set} +> {f g :A7 B}> 
fegr(tx:A7F x= gx 
=-ext refl x = refl 
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(this function will be called happly in section 9.4.1), but the converse property 
cannot be shown because we have no induction principle at our disposal to show 
that two functions are equal. For instance, one cannot show 


id-add-@ : (An-7n+ 0) = (Anan) 


Try it for yourself in order to get convinced. 
This means that there is no proof of the following function extensionality 
principle: 


FE : Set, 
FE={AB: Set} {f g: AFB} 9 (xX: AFF xXEEx) If BB 


not to mention the dependent function extensionality principle, which is the 
generalization adapted to dependent functions: 


DFE : Set, 
DFE = {A : Set} {B : A> Set} {f g : (x : A) 7B x} 9 
(x :A7f x=z=gx7r7f zg 


See [BPT17] for a proof of this fact. 

This situation is deeply unsatisfactory: this means that we cannot really use 
equality to reason on functions. We could add (dependent) function extension- 
ality as an axiom with 


postulate funext : DFE 


but we would not get very far because we would not have any computation rule 
associated to it, making proofs very hard in practice. Also, function extension- 
ality seems to contradict the constructivity of proofs. Namely, a given function 
can be implemented in various ways, with various algorithms and various com- 
plexities (see section 2.1.1 for an example), and function extensionality seems to 
simply destroy it all, since we are considering all of them as equal. In fact, this 
reasoning would hold if proofs of equality did not have any content... but we 
will see that it is not the case in next section: the way we prove that two func- 
tions are equal is relevant here and cannot simply be discarded. A much better 
treatment of equality is proposed by homotopy type theory, which is presented 
in this chapter, and function extensionality will be a consequence of its main 
axiom, univalence, see section 9.4.9. It resolves the tension by considering that 
two equal things are not the same, but can be deformed one into the other. 


9.1.6 Uniqueness of identity proofs. At some point in the 90s, people 
started to wonder: is there a proof-theoretic content in proofs of equality? Or 
more prosaically: can there be more than one proof of the equality between 
two given terms? This suggested investigating the provability of the following 
property called uniqueness of identity proofs 


UIP : Set, 
UIP = {A : Set} {x y: A} (pq: x=y)7pzq 


which states that two proofs of x = y for some terms x and y are necessarily 
equal. 

In particular, in the case x = y, we know a particular proof of x = x, namely 
refl(z). If we are only interested in such cases, we can also consider the following 
variant, which we call here uniqueness of reflexivity proofs: 


CHAPTER 9. HOMOTOPY TYPE THEORY 408 


URP : Set, 
URP = {A : Set} {x : A} (p : x = x) 7p = refl 


Clearly, URP is a particular case of UIP: 


UIP-URP : UIP ~ URP 
UIP-URP UIP r = UIP r refl 


Interestingly, we can also recover UIP from URP. Namely, consider two identity 
proofs p,q: x =y. We can picture the identities p and q as paths from z to y, 
as in the figure below: 


This diagram makes it plausible that showing that p is the same as q (in the 
sense that p = q), should be equivalent to showing that the path from y to y, 
obtained as the concatenation of p taken backward followed by q is the same as 
the reflexivity on y. And indeed, one can show the following implication: 


loop-= : {A : Set} {x y : A} (pq: 
trans (sym p) q = refl 7 p 
loop-= refl q h = sym h 


=y)? 


I xX 


from which one deduces that URP implies UIP: 


URP-UIP : URP 7 UIP 
URP-UIP URP p q = loop-= p q (URP (trans (sym p) q)) 


The ariom K. A third equivalent property is called K, and is due to Stre- 
icher [Str93]. It can be thought of as the “Leibniz variant” (see section 9.1.4) of 
URP: if the only proof of an equality + = x is reflexivity then, in order to show 
that a property P depending on such a proof is valid, it should be enough to 
show it in the case of reflexivity. This property can thus be formulated as 


K : Set, 
K = {A : Set} {x : A} 7 
(P : (x = x) 7 Set) + P refl + (p : x =x) 7 Pp 


It is simple to show that URP implies K: 


URP-K : URP 7 K 
URP-K URP P r p = subst P (sym (URP p)) r 


and that K implies URP: 


K-URP : K 7 URP 
K-URP K p = K (Ap 7 p = refl) refl p 


Note that K is a slight variant of the eliminator J, where we consider proposi- 
tions depending on proofs of x = x (instead of z = y), thus the name. However, 
K cannot be proved from J (try it!): this can be demonstrated by observing 
that the non-trivial models of homotopy type theory validate the latter but not 
the former: the first such model was found by Hofmann and Streicher [HS98], 
by interpreting types as groupoids. 
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Pattern matching without K. If we try to prove UIP or K in vanilla Agda, using 
pattern matching as usual, something unexpected happens, as first noticed by 
Coquand [Coq92b]: we succeed! Namely, we can show UIP by 


UIP-proof : UIP 
UIP-proof refl refl = refl 


and K by 


K-proof : K 
K-proof P r refl =r 


This means that Agda is not implementing dependent type theory exactly as 
we have presented it: in fact, the default pattern matching algorithm of Agda 
is simply too permissive. In order to use a saner algorithm, we should always 
start our files with 


{-# OPTIONS --without-K #-} 


and the above proofs of UIP and K will not be accepted anymore. The reason 
why the current pattern matching is enabled by default is that it simplifies 
proofs, if one is prepared to lose all information about identities: it can be 
shown that using this algorithm essentially amounts to adding UIP (and not 
more) to the dependent type theory [McBO0]. 


9.2 Types as spaces 


9.2.1 Intuition about the model. In order to better understand what logic 
looks like in the absence of uniqueness of identity proofs, one should be prepared 
to accept the following change of point of view: types should be interpreted not 
as booleans, nor as sets, but as spaces. Similarly, the elements of a type A > B 
should be interpreted not as implications, nor as functions, but as continuous 
functions. This interpretation is the starting point of homotopy type theory 
which was pioneered by Voevodsky and other people [Unil3]. In order to make 
this clear, even in Agda, starting from now, we will write Type instead of Set 
to designate the type of types, which can be done by defining 


Type : (i : Level) ~ Set (lsuc i) 
Type i = Set i 


Types are not always sets! 


Spaces. We will deliberately remain vague about what we mean by a space, 
but one can think of those as geometric shapes, in arbitrary dimension, i.e. as 
topological spaces, or as something that can be obtained by gluing segments, 
surfaces and volumes in arbitrary dimension. More details can be found in 
standard algebraic topology textbooks such as [Hat02]. Some examples in low 
dimensions are 


-O @de SoS © 


Most importantly, those spaces should be considered up to “deformations” which 
preserve the shape: we do not distinguish between spaces which look roughly 
the same. 
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Paths. The reason why this interpretation is useful to reason about identities is 
that we now have a representation for them: they correspond to paths, as we 
now explain. We write I for the interval space: 


Concretely, it can be defined as the set I = [0,1] of reals between 0 and 1 (both 
included) equipped with the euclidean topology. A path in a space A from a 
point x to a point y is a continuous function 


p:loaA 


such that p(0) = x and p(1) = y: 


Such a path can also be thought of as a continuous way to go from x to y 
depending on a “time” parameter ¢ € J: at time t = 0 we are at x, at time t = 1 
we are at y, and at a time ¢ in between we are at p(t). 

Given a point x there is always a path from x to x, the constant path, which 
is the function defined as p(t) = x for every t € I. This corresponds to remaining 
at x. 


Interpreting types. From now on, we are going to work with the following inter- 
pretation of types in mind: 


— we interpret a type A as a space, 
— an element x of type A will be seen as a point of A, 


— an identity proof p in Id4(z,y) as a path from z to y, 


| 


a function f : A > B as a continuous function from A to B. 


For this reason, we sometimes write p : « = y for a path from x to y. In 
particular, a reflexivity proof refl(a) : « = x will be interpreted as the constant 
path from x to x. We insist on the fact that the interpretations of functions are 
always continuous, even if we omit mentioning it. 

Given two elements x,y : A and two identities p,q : Id4(x,y), consider an 
identity a : Idja,(z,y)(p,q) from p to g. Topologically, it will correspond to a 
continuous way of deforming the path p into the path q within paths from x 
to y, i.e. the endpoints are fixed. It thus corresponds to a surface: 
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Similarly, paths between paths between paths correspond to volumes, and so 
on. 

In particular, consider a type corresponding to the circle (should there be 
one). Given two distinct points « and y, we have two distinct paths p and q 
going from «x to y: 


Moreover, since the circle is hollow, there can be no continuous way of deform- 
ing p into g. A type theory which can account for such a type will not validate 
the principle of uniqueness of identity proofs. 


Homotopy equivalence. We mentioned that spaces are considered up to defor- 
mation and we now want to make more precise this notion of deformation we are 
using. We say that two continuous functions f,g:A— B are homotopic when 
for every point x : A there is a path f(x) = g(x), which varies continuously 
with 7. We write f ~ g when f and g are homotopic. 

Two spaces A and B are homotopy equivalent when there are two functions 


f:A-B and g: BoA 


such that 
gof~id, and fog~idg 


where id4 : A > Ais the identity function, defined by id4(x) = 2 for x in A, and 
similarly for idg. This is the notion we will use when we think of two spaces 
as being equivalent up to deformation: the adjective homotopy in homotopy 
type theory refers to the fact that we are considering spaces up to homotopy 
equivalence. 

For instance, the space A reduced to a point x (on the left) is homotopy 
equivalent to the disk B (on the right): 


4 
, 
g 


ge 


y 


Namely, we can take the function f : A — B which takes x to some point y 
of B, and the function g : B > A which takes every point of B to x. This is 
a homotopy equivalence since, for the only point x of A we have go f(x) = a, 
and for every point z of B there is a path from y = fog(z) to z, which depends 
continuously on z (see the picture). 

Consider the variant of the preceding situation, where A is still reduced to 
a point x, but B is now a circle instead of a disk: 


8e 


4 
— 
g 
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We can still define functions f and g in the same way. Moreover, given a point z 
in B there is still a path from y = f o g(z) to z. However, there is no way of 
choosing such a path in a continuous way: when moving z around the circle, at 
some point the path has to jump from turning counterclockwise to clockwise, 
or from turning once around the circle to turning twice, etc. For this reason the 
point and the circle are not homotopy equivalent. 


Remark 9.2.1.1. In the previous example, it can be noted that the circle has a 
hole whereas the point does not. It can be shown that homotopy equivalence 
preserves the number of holes in any dimension [Hat02] (these are called the 
Betti numbers and are closely related to homotopy groups), from which we 
could have easily seen that the two spaces are not equivalent. There is actually 
even a subtle converse to this property. A map f : A + B between spaces is 
a weak homotopy equivalence when it induces a bijection between the holes (in 
any dimension) of A and those of B (as a particular case, it should induce a 
bijection between the path components of A and those of B). When A and B 
are “nice” spaces, by which we mean gluing of disks (traditionally called CW- 
complexes), a map f : A — B is a weak homotopy equivalence if and only if 
it is a homotopy equivalence (this is known as the Whitehead theorem). The 
restriction to CW-complexes is not really a limitation here, because any space 
can be shown to be weakly homotopy equivalent to a CW-complex. 


Eliminating identities. The elimination principle of identity types says that in 
order to show a property on a space containing a path p, it is enough to show 
this property on the corresponding space where p has been made a constant 
path. For instance, suppose that we want to show that the circle A satisfies 
UIP, i.e. any two paths are equal: 


We thus begin our proof with 


UIP : (xy: A) (PQ: x= y)7p=q 
UIP x ypq=? 


We begin by eliminating p, and Agda takes us to the proof 


UIP: (xy: A) (Pq: xy) 7peq 
UIP x .x refl q =? 


which corresponds to restricting to the space A’ 


Cy: 


obtained from the circle by assimilating p to a constant path. This is a perfectly 
valid thing to do because the spaces A and A’ are clearly homotopy equivalent. 
However, if we proceed further and eliminate q, Agda gets us to 
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UIP : (xy: A) (Pq: xy) 7+peq 
UIP x .x refl refl = ? 


which means that we are now restricting to the space A” reduced to a point 


x 
° 


obtained from A’ by assimilating q to the constant path. This step should not 
be valid because, as we have seen, A’ and A” are not homotopy equivalent. In 
fact, if we activate the flag --without-K of Agda, as we should always do, Agda 
rejects this last step by issuing an error: 


I'm not sure if there should be a case for the constructor refl, 
because I get stuck when trying to solve the following unification 
problems (inferred index 2 expected index): 

? 


X1 = X1 
Possible reason why unification failed: 
Cannot eliminate reflexive equation x; = x; of type A; because K 


has been disabled. 
when checking that the expression ? has type refl = q 


which is his verbose way of saying that you are trying to do something forbidden 
in the absence of axiom K. 


Univalence. We will see that the type theory (without K) does not exactly 
match the intuition that we have of types as spaces: some properties that we 
expect to be shown cannot be proved. The reason is that we lack some ways of 
constructing equalities. For instance, we cannot construct non-trivial equalities 
between functions: in particular, we cannot prove function extensionality. In 
order for logic and topology to match precisely, one needs to assume an axiom, 
called univalence. It will be only be presented in section 9.4, but we will mention 
some of the properties which it allows to prove before that, in order to motivate 
the need for it (e.g. function extensionality will be a consequence of it). 


9.2.2 The structure of paths. We shall now study the constructions and 
operations which are available on paths. The first one, which we have seen many 
times, is the construction of the constant path on a point x, which is simply 
given by refl. Given two paths p: x = y and q: y = z such that the end of p 
matches the beginning of g (both are y), we can build their concatenation p- q, 
which is a path from x to z. If we see them as continuous functions p: I > A 
and q: I — A, where I is the interval [0,1], this is defined as 


p(t) if0< 


t<1/2, 
g(t—1/2) if1/2<t 


-g\(t) = 
(p- a(t) BF 
In the following, we will generally not give such explicit constructions, and 
simply provide the formalization in Agda, which is in this case 


_-_: V {i} {A : Type i} {x yz: A} > 
(p:xF=yrq:y2z7x=z 
refl - q=q 
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Of course, we have already seen this proof in section 6.6.2: this is simply the 
transitivity of =. As expected, the constant path is a unit for concatenation on 
the left: 


--unit-l : V {i} {A : Type i} {x 
(p : x = y) a refl- p 
*-unit-l p = refl 


: A} > 
p 


W< 


This does not mean that, given a path p: x = y, the paths refl-p and p are the 
same. In fact, they are not since the one on the left is 


t if0< 


t<1/2, 
p(t—1/2) if1/2<t 


refl-p)(t) = 

(reft-p)(t) oe 
and is a different function from p: we usually don’t have (refl-p)(t) = p(t) for 
every t € I. They are however homotopic, in the sense that there is a path 
(i.e. a deformation) from the former to the latter (exercise: explicitly define this 
path). Similarly, the constant path is also a unit on the right for concatenation: 


--unit-r : V {i} {A : Type i} {x y: A} > 
(p: xX =y) 7p: refl =p 
--unit-r refl = refl 


and concatenation is associative: 


--assoc : V {i} {A : Type i} {x yzw: A} > 
(Pp: x=y7 (qi: y2zr(r:z ew) 
(p> q-r =p: (qr) 

--assoc refl refl refl = refl 


Next, given a path p : x = y, we can define the inverse path p~! : y = x by 


p \(t) = p(1—t), ie. the path p taken “backwards”. In Agda, it is written ! p 
(or sym p) and defined by 


!_ : V {i} {A : Type i} {x y: A} 7 x=yr7y=x 
! refl = refl 


Again, we can show the expected properties such as the fact that it is a neutral 
element on the left: 


--inv-l : V {i} {A : Type i} {x 


(p:x=y)7!p-p 
--inv-l refl = refl 


: A} > 
refl 


W< 


which means that taking a path backwards and then forward is the same (up to 
homotopy) as doing nothing (try it in the street). The same holds on the right: 


--inv-r : V {i} {A : Type i} {x 


(p:x=y)7p-!p 
--inv-r refl = refl 


: A} > 
refl 


W< 


and taking the inverse twice does nothing: 


I-! : W {i} {A : Type i} (xy: A} 4 (p: xey 7! (lp) =p 
!-! refl = refl 
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Groupoids. If we sum up the situation, given a type A, we have 


— a set A of points, 
— for every points x and y in A, we have a set x = y of paths from z to y, 


— for every point x, we have a path refl(x) : 2 =a, 


| 


for every points xz, y, z and paths p: x = y and q: y = z, we have a 
concatenation p-q: x = 2, 


such that 


— concatenation is associative and admits constant paths as neutral elements 
on both sides, 


— every path admits a path which is an inverse on both sides. 


A groupoid is precisely this structure, if we assume that the two above axioms 
hold up to equality (as opposed to up to homotopy): it consists of a set (of 
points or objects), together with a set (of paths or morphisms) between any pair 
of points, equipped with a composition and identities (constant paths), such 
that composition is associative, unital and admits inverses. The first model 
of dependent type theory which did not validate UIP was actually constructed 
by interpreting types as groupoids [HS98]. It can be seen as a “degenerate” 
version of the model of spaces, in the sense that the only paths between paths 
are constant paths. 


9.3 n-types 


Now that we have this point of view on types as spaces, we can start classify- 
ing types depending on their topological properties. A particularly interesting 
classification is given by n-types, which are types which contain no holes of 
dimension k > n, for some natural number n. 


9.3.1 Propositions. The most simple kind of types are propositions [Uni13, 
Section 3.3]. We can think of a proposition as either being 


— a point meaning that it is true, or 
— empty, meaning that it is false. 


In particular, when it is true, we only allow for one point: if there were many, 
it would mean that there would be many reasons why the proposition would be 
true, which is not what we have in mind for propositions. One should be aware 
that the above description is slightly misleading: 


— it will not be the case that we can prove that a proposition is either empty 
or not, i.e. either true or false, because we live in an intuitionistic world, 
where the excluded middle is not expected to hold, 


— in true, we require that there is only one point, call it x9, up to homotopy: 
this means that if there is another point x, it should be equal to zo. 


In both cases (true or false), we note that a proposition is such that any two 
points x and y are related by a path x = y: this property holds by definition 
when the proposition is true and is vacuously true when the proposition is empty. 
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Definition. The previous remark suggests defining a predicate isProp by 


isProp : V {i} + Type i 7 Type i 
isProp A= (x y: A) 7x zy 


with the intended meaning that isProp A holds when the type A is a proposition. 


Examples. We can show that | is a proposition (it corresponds to the empty 
space): 


1-isProp : isProp L 
1-isProp () 


or T is a proposition (it corresponds to the space with one point, namely tt): 


T-isProp : isProp T 
T-isProp tt tt = refl 


We can also show that the type of booleans is not a proposition since it has two 
points, true and false, which are not equal: 


Bool-isn'tProp : > (isProp Bool) 
Bool-isn'tProp P with P true false 
Bool-isn'tProp P | () 


We insist once more on the fact that types are handled up to homotopy, so 


that a disk 


is also an acceptable proposition because it is homotopy equivalent to a point. 
However, there is a worrying situation with our definition: it seems that the 


circle C 


should also be accepted by our definition although it is not equivalent to a point: 
after all, given any pair of points there is a path between them in the circle. 
However, C' is not a proposition and one cannot show isProp C: the reason is 
that there is no way choosing such paths in a continuous way, and we are only 
allowed to manipulate continuous functions. Namely, one can convince himself 
that there is no way of choosing a path pz, : x = y for every pair of points 
x and y in C, in a way which is continuous in both x and y (the reasoning is 
similar to the one we have done above to show that the circle is not homotopy 
equivalent to a point). 


The type of propositions. We can define the type of all propositions as 


hProp : V i + Type (lsuc i) 
hProp i = © (Type i) isProp 
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(we call it hProp and not Prop because the latter is a reserved keyword in recent 
versions of Agda). It should be remarked that even though we know that this 
type should be small, because we add at most one point on hProp per type in 
Type i, the general rule for handling levels in =-types does not allow the result 
to be in Type i, because Type i is at level i+1, so that we have to assume that 
it forms a large type, ie. at level i+1 (this is further discussed in section 9.3.4, 
with the propositional resizing principle). 


Operations on propositions. In the following, we will use the set-theoretic nota- 
tions for usual operations on types and the logical notations for the correspond- 
ing operations on propositions: 


on types || x | U | > | II 
on propositions | A | Vv | > | Vv 


(we are using here the more traditional notation LI instead of the usual Agda 
notation & for coproduct). The Curry-Howard correspondence allowed us to 
identify both lines, but now that we have a rich type theory, we can tear logic 
and types apart again! In order for this to make sense, we should check that 
the operations are well-defined on propositions, i.e. that the result is a propo- 
sition when applied to proposition. We will see that it is actually not always 
the case and that their definitions have to be adapted to properly operate on 
propositions. 

Propositions are closed under products, i.e. the product of two propositions 
is itself a proposition: 


x-isProp : V {i j} {A : Type i} {B : Type j} > 

isProp A 7 isProp B > isProp (A ~* B) 
x-isProp PA PB (a , b) (a' , b') with PA aa' , PB b b' 
x-isProp PA PB (a , b) (.a , .b) | refl , refl = refl 


We can therefore simply define the conjunction of propositions as their product: 


_A_ : V {i j} 7 Type i 7 Type j ~ Type (lmax i j) 
AANAB=Ax*xB 


Similarly, we expect that propositions are closed under function spaces, so 
that we can simply define implication as function space. In order to show this, 
it turns out that we have to assume function extensionality (which will become 
a theorem in section 9.4.9), because we have no useful way to show equalities 
between functions otherwise, see section 9.1.5. If this is assumed, one can show 
that A — B is a proposition as soon as B is: 


7-isProp : V {i j} {A : Type i} {B : Type j} > 
isProp B > isProp (A > B) 
+-isProp PB f g = funext (A x > PB (f x) (g x)) 


and similarly for I-types, if we assume dependent function extensionality: 


l-isProp : V {i j} {A : Type i} 7 {B : A 7 Type j} > 
((x : A) 7 isProp (B x)) 7 isProp ((x : A) 7 (B x)) 
ll-isProp PB f g = funext (A x ~ PB x (f x) (g x)) 
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In particular, the negation =A of a type A is always a proposition since it is the 
type A — | by definition, and is a proposition: 


-=-isProp : V {i} {A : Type i} + isProp (- A) 
--isProp = 7-isProp L-isProp 


The situation for the coproduct A+ B of two propositions is more delicate. 
We have to assume that the types A and B have an “empty intersection” in 
order to show that their coproduct is itself a proposition. Here, having an 
empty intersection amounts to supposing that —=(A A B) holds, or equivalently 
that A> B= L holds. 


u-isProp : V {i j} {A : Type i} {B : Type j} 7 
isProp A + isProp B 7 (A 7+ B- 1) 7 isProp (A U B) 
u-isProp PA PB f (inl x) (inl y) = ap inl (PA x y) 
u-isProp PA PB f (inl x) (inr y) = L-elim (f x y) 
u-isProp PA PB f (inr x) (inl y) = L-elim (f y x) 
u-isProp PA PB f (inr x) (inr y) = ap inr (PB x y) 


(the operation ap above is another name for cong, see section 6.6.2). The con- 
dition on intersection is really needed. For instance, the type T is a proposition 
but the type T LT is not because it has two points: one could show that it is 
not a proposition in the same way we were able to show that Bool was not a 
proposition in section 9.3.1 (after all, the types TUT and Bool are isomorphic). 
An important consequence of the above lemma is that, for every proposition A, 
the type —AL A is also a proposition: 


isDec-isProp : V {i} {A : Type i} + isProp A 7 isProp (isDec A) 
isDec-isProp PA = L-isProp --isProp PA A a' a-va'a 


Above, isDec Ais asimply a notation for + A U A, meaning that A is decidable: 
we have just shown that, for a proposition, being decidable is a proposition. 

We will be able to give a proper definition of V (not just disjoint propositions) 
in section 9.3.4, but we do not have the tools to do so for now. For similar reasons 
as for coproduct, propositions are not closed under »-types and we also defer 
the definition of the 4 quantifier. 

As a final remark about connectives on propositions, we mention that it 
would be cleaner and more conceptual to define them directly on hProp, i.e. have 
them provide the proof that they produce propositions. For instance, conjunc- 
tion could be defined as 


A_ : V {i j} 7 hProp i 7 hProp j > hProp (lmax i j) 


(A , PA) A (B., PB) = (A x B) , x-isProp PA PB 


We choose not to do this here in order to keep closer to bare metal and avoid 
small lemmas which would obfuscate the code at first read. 


Predicates and propositions. For any type A, the type isProp A is itself a propo- 
sition: being a proposition is a proposition. If this was not the case, there could 
be multiple reasons why a proposition could be a proposition, and the meaning 
of this would be rather obscure. The proof, which is called isProp-isProp, is 
deferred to section 9.3.2. 
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Up to now, we have formalized a predicate on a type A as a function P 
whose type is A > Set, see section 6.5.9. For the same reasons as above, such a 
function really deserves the name of predicate only when it is the case that P x 
is a proposition for every element x of type A. We can thus formalize the fact of 
being a predicate as 


isPred : V {i j} {A : Type i} + (A 7> Set j) 7 Set (lmax i j) 
isPred {_} {_} {A} P = (x : A) 7 isProp (P x) 


The function isProp-isProp described above shows that isProp is a predicate: 


isProp-isPred : V {j} 7 isPred {j = j} isProp 
isProp-isPred A = isProp-isProp 


Similarly, as expected, being a predicate is itself a predicate: 
isPred-isPred : V {i j} {A} 7 isPred (isPred {i} {j} {A}) 
isPred-isPred P = H-isProp (A x ~ isProp-isProp) 
Propositional extensionality. On propositions, there are two sensible notions of 
being the same: 

— propositional equality =, 

— logical equivalence =. 
In Agda, the second one is defined as usual from implication an conjunction by 


_4_ : V {i} + Type i 7+ Type i 7 Type i 
A#B= (A> 8B) A (B 2 A) 


We now briefly investigate the relationship between the two. 
It is easy to observe that two equal propositions are equivalent, by induction 
on their equality: 


=-to-+ : V {i} 7 {AB : Typei}7A=BrAA#B 
=-to-4 refl = (A x 7 x) , (A x > x) 


The converse implication is called propositional equality, or PE: 
(As B) => (A=B) 
which can be defined in Agda by 


PE : V {i} 7 Type (lsuc i) 
PE {i} = V {AB : Type i} 7 isProp A + isProp Bt AH BAA=ZB 


This implication cannot be shown and could be added as an axiom if one wants 
to use it (for instance, we use it in order to show Diaconescu’s theorem in 
section 9.3.4). In fact, we will add univalence, which is a generalization of 
propositional equality, as an axiom, and show that it implies propositional ex- 
tensionality in section 9.4.10. 
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Propositions as T or L. At the beginning of this section, we have indicated that 
a proposition should be either empty or a point (up to homotopy), i.e. either L 
or T. But can we formalize this? A first idea would be to show, for any type A, 
the implication 

isProp(A) > (A= 1) V(A=T) 


However, we will not be able to show this for any type A, because it would allow 
us to decide whether A is true or not, which we cannot because we live in an 
intuitionistic world. However, if we know that A holds then it should be equal 
to T: 

isProp(A) = A> (A=T) 


and if A does not hold then it should be equal to _L: 
isProp(A) = ~A=> (A=) 


We currently cannot show that, but we will see in section 9.4.6 that it can be 
proved if we assume the univalence axiom. 


Classical logic. If one is disposed to work with classical logic, as presented in 
section 2.5, one should add the law of excluded middle 


LEM : V {i} 7 Type (lsuc i) 
LEM {i} = {A : Type i} 7 isProp APAV-AA 


or double negation elimination 


NNE : V {i} 7 Type (lsuc i) 
NNE {i} = {A : Type i} 7 isProp A 77> (7 A) 7A 


as postulates. In the above formulation, the reader should note that we are 
restricting those laws to propositions: they are only intended to talk about 
logic. For instance, we expect that the law of excluded middle states that a 
proposition is true or not, not that we can decide whether any type is empty 
or not, and construct an element of this type in the latter case. This axiom is 
consistent with homotopy type theory [KL20]. However, the general form of the 
law of excluded middle 


LEM' : V {i} + Type (lsuc i) 
LEM' {i} = {A : Type i} 7 AVA 


(not restricted to propositions) is inconsistent with the axiom of univalence: 
it not only implies that we can choose an element in every non-empty type, 
but also that we should be able in a continuous way, which is not possible, see 
section 9.4.7. 


9.3.2 Sets. After having considered propositions, the next interesting kind of 


types are sets [Unil3, section 3.1]. Those are types which are collections of 
points (up to homotopy). A typical set is thus: 
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O) 


is not a set because it is not a collection of points. In a set, two points x and y 
are either in the same connected component, in which case they are equal in a 
unique way (up to homotopy), or they are in distinct components, in which case 
they are not equal. In other words, if they are equal, they should be uniquely 
so. This suggests defining the following predicate for sets: 


However, the circle 


isSet : V {i} + Type i > Type i 
isSet A= (xy: A) (PQ: x= y)7pezq 


Examples of sets. For instance, booleans form a set 


Bool-isSet : isSet Bool 
Bool-isSet false false refl refl = refl 
Bool-isSet true true refl refl refl 


Natural numbers also form a set. First observe that, since equality is a 
congruence, every path q:m =n induces a path p:m+1l=n+1: 


suc-= : {mn : N} + (m =n) > (suc m = Suc n) 
suc-= p = ap suc p 


The path p is constructed from the path q by a direct application of ap, which 
is another name for cong. Now, we can show a lemma stating that every path 
p:m+1=n+1 is of this form, i.e. there are no more paths between successors 
than those induced by congruence: 


suc-pred-= : {mn : N} > 
(p : suc m = suc n) ~ p = ap suc (ap pred p) 
suc-pred-= refl = refl 


From there, we can show that N is a set, i.e. that any two paths p,q: m=n 
between natural numbers are equal. We proceed by induction on m and n. The 
base case where both are 0 is obvious, for the inductive case where both are 
successors, we can use the above lemma to reduce to the case where both p and 
q are obtained by congruence and we can use the induction hypothesis: 


N-isSet : isSet N 
N-isSet zero zero refl refl = refl 
N-isSet (suc m) (suc n) p q = 
p =( suc-pred-= p ) 
ap suc (ap pred p) =( ap (ap suc) (N-isSet mn 
ap suc (ap pred q) =( sym (suc-pred-= q) ) 
q | 


wore) te) 


More generally, all the basic datatypes we usually use (natural numbers, strings, 
etc.) are sets. This includes the types from the previous section, since one 
can show that every proposition is a set, see below. Moreover, all usual type 
constructors (lists, vectors, etc.) preserve the fact of being a set. 


Exercise 9.3.2.1. Show that the type List A is a set when A is a set. 


CHAPTER 9. HOMOTOPY TYPE THEORY 422 


Closure properties. Sets are closed under most usual operations (products, co- 
products, arrows, II-types, U-types), as expected from set theory. As an illus- 
tration, let us show the closure under products. Recall from section 9.1.5 that, 
given two types A and B, a pair of paths p: 7 =v’ in Aandq:y=y’ in 
B canonically induce a path from (z,y) to (2’,y’) in A x B, that we abusively 
write (p,q) : (%,y) = (a, y’) here: 


x-= : V {i j} {A : Type i} {B : Type j} {x x' : A} {fy y' : B} > 
X=x'r7yzy'47(x, y) = (', y') 


x-= refl refl = refl 


Moreover, every path in A x B is equal to a path of this form. More precisely, a 
path p: (x,y) = (a’,y’) in Ax B induces, by congruence under the projections, 
paths pa: xc =w' and pp: y = y’, and the path induced by p, and pg using 
previous function is equal to p, i.e. p = (pa, pp): 


x-E-y : V {i} {5} {A : Type i} {B : Type j} 
{zz' :Ax B}{p:z2=2z'} > 
p = x-= (ap fst p) (ap snd p) 

x-=-y {p = refl} = refl 


Finally, we can use this to show that the product A x B of the sets A and B is 
itself a set. Namely, given parallel paths p and q in A x B, we have 


p= (pa, PB) = (44,98) =4 


where the first and last equalities come from the previous observation, and the 
one in the middle follows from the fact that we have pa = qa and pg = gp 
because both A and B are sets: 


x-isSet : V {i j} {A : Type i} {B : Type j} > 
isSet A 7 isSet B > isSet (A * B) 
x-isSet SA SB (x , y) (x' , y') p qe 
p =( x-S-r ) 
x-= (ap fst p) (ap snd p) 


( ap2 x-= 
(SA x x' (ap fst p) (ap fst q)) 
(SB y y' (ap snd p) (ap snd q)) ) 
x-= (ap fst q) (ap snd q) =( sym x-=-y ) 
qu 


Propositions are sets. Any proposition is a set [Unil3, Lemma 3.3.4]. This is 
intuitively expected because a proposition should be either empty or a point, 
and thus a particular case of a collection of points. Consider a proposition A, 
and two paths p,q: x = y between points x and y of A. In order to show that A 
is a set, we have to show that the paths p and q are equal, which is not easily 
done directly. Instead, we are going to show that both are equal to a third 
“canonical” path. 
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Fix a point z in A. Since A is a proposition, for every point x of A, there is a 
path p, : z =x. We now have a candidate for the canonical path: let’s show 
that p =p,'- py. By induction on p, this is immediate, since when p = refl(z), 
we have refl(x) = p,!- pz, see section 9.2.2: 


aProp-isSet-lem : V {i} {A : Type i} {x y : A} 7 (P.: isProp A) > 


(zZ: A) (p: xy) 7p! (Pz2x)-+ (zy) 
aProp-isSet-lem {x = x} P z refl = sym (--inv-l (P z x)) 


Similarly, we can show that q = p;'- py, and therefore deduce p = q, ie. Aisa 
set: 


aProp-isSet : V {i} {A : Type i} + isProp A > isSet A 
aProp-isSet {A =A}Pxypq=e 
(aProp-isSet-lem P x p) - (sym (aProp-isSet-lem P x q)) 


This result allows deducing the fact that being a proposition is itself a propo- 
sition using dependent function extensionality [Unil3, Lemma 3.3.5]. Namely, 
consider a type A and two proofs f, g that A is a proposition: those are functions 
taking two elements x and y of A and producing a path x = y. By extensional- 
ity, is enough to show that we have f xy = gay for every points x,y: A, which 
follows immediately from the fact that A is a proposition (by f or g). 


isProp-isProp : V {i} {A : Type i} 7 isProp (isProp A) 
isProp-isProp f g = 
funext2 (A x y 7 aProp-isSet f x y (f x y) (g x y)) 


Above, funext2 is the obvious variant of funext for functions with two argu- 
ments. 


Hedberg’s theorem. An abstract reason why most usual types are sets is because 
they have decidable equality: Hedberg’s theorem states that any type with a 
decidable equality is necessarily a set [Hed98, KECA16] and [Uni13, section 7.2]. 
For instance, we can decide the equality of natural numbers (see section 6.6.8), 
therefore they form a set (which we have already proved directly above). 

We recall that a type A is said to be decidable when —A U A holds, i.e. we 
can either show that it is empty or produce an element of it: 


isDec : V {i} (A: Type i) 7 Type i 
isDec A=-7AUA 


In particular, a type A has decidable equality when we can decide whether any 
two elements of A are equal or not: 


isDecEq : V {i} (A: Type i) 7 Type i 
isDecEq A = (x y : A) 7 isDec (x = y) 


Although this is the property usually considered, it will turn out to be more 
convenient here to consider a variant of this property. We say that a type A has 
the property of double negation elimination if 7A > A: 


isNNE : V {i} + Type i > Type i 
isNNE A= 7 (7 A) A 


CHAPTER 9. HOMOTOPY TYPE THEORY 424 


and we write isNNEq when its equality has this property: 


isNNEq : V {i} 7 Type i > Type i 
isNNEq A = (x y : A) 7 iSNNE (x = y) 


It is well known that decidability of a type implies double negation elimination: 


isDec-isNNE : V {i} {A : Type i} 7 isDec A 7 isNNE A 
isDec-isNNE (inl a') a'' = l-elim (a'' a') 
isDec-isNNE (inr a) =a 


and therefore decidability of equality implies that equality has the double nega- 
tion property. In this section, by “having a decidable equality”, we will therefore 
without loss of generality mean “having an equality with the double negation 
elimination property”. 

Suppose that the type A has decidable equality. In order to show that A 
is a set, we have to show that any two paths p,q: x = y are equal. The 
proof strategy here is the same as above: we should show that p is equal to 
a “canonical” path of type x = y, the path q will similarly be equal to this 
path and we will be able to conclude. The fact that A has decidable equality 
provides us with a canonical path between x and y. Namely, the existence of 
the path p implies that we have a proof Ak.kp of =7(a = y) and the double 
negation elimination property provides us with a path x = y: 


nnePath : V {i} {A : Type i} 7 isNNEg A > 
DOYS APS & Sy) a eS ¥ 
nnePath N {x} {fy} p=Nxy (A kak p) 


This path is canonical, in the sense that it does not depend on the choice of 
the path p. Namely, we know from section 9.3.1 that the type ~7(a = y) isa 
proposition (any negation of a type is). In particular, given two paths p and q 
of type x = y, the proofs Ak.kp and Ak.kq of a(x = y) are equal and therefore 
induce equal paths of type « = y by elimination of double negation: 


nnePathIndep : V {i} {A : Type i} (N : isNNEg A) {x y : A} 
(Pp q : X = y) 7 nnePath N p = nnePath N q 
nnePathIndep N {x} {y} p q = 
ap (N x y) ((--isProp (Ak +k p) Ak + k q))) 


In this way, we have constructed a canonical path pz : © = y, which depends 
only on x and y. Finally, we want to show that p = pz,y, i.e. the arbitrary path p 
is equal to the canonical one. By induction on p, this would require to show 
that refl(x) = pz,z, and there is no reason why this should hold. So instead, we 
consider a variant of the canonical path and show that p = Dee ‘ Pz,y- Namely, 
by induction on p, we are left proving refl(x) = ae - Dz,2, Which does hold, see 
section 9.2.2: 


nnePathEg : V {i} {A : Type i} (N : isNNEq A) {x y : A} 
(p : X = y) 7 p =! (nnePath N refl) - nnePath N p 
nnePathEg N {x} {y} refl = sym (--inv-l (N x x (A z > z refl))) 


Finally, we can conclude that p= ies : Px,y = q and therefore that A is a set: 
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Hedberg : V {i} {A : Type i} (N : isNNEq A) ~ isSet A 
Hedberg Nx ypq= 
p =( nnePathEq N p ) 
(! (nnePath N refl) - nnePath N p) 
=( ap (A nnp 7 ! (nnePath N refl) - nnp) (nnePathIndep N p q) ?) 
(! C(nnePath N refl) - nnePath N q) 
=( sym (nnePathEq N q) ) 
qe 


For instance, we have shown in section 6.6.8 that natural numbers have 
decidable equality. We thus have an alternative proof that they form a set by 
Hedberg theorem: 


N-isSet : isSet N 
N-isSet = Hedberg (A x y + isDec-isNNE (x = y)) 


9.3.3 n-types. We now generalize the classification of types as propositions or 
sets into a full hierarchy of types. 


Groupoids. It can be observed that the definition of being a set can be reformu- 
lated as: 


isSet : V {i} + Type i > Type i 
isSet A = (x y : A) 7 isProp (x = y) 


i.e. a set is a type such that every pair of points x and y, the type x = y is 
a proposition. This reformulation suggests the next thing to try: we define a 
groupoid as a type such that for every pair of points x and y, the type x = y is 
a set: 


isGroupoid : V {i} + Type i ~> Type i 
isGroupoid A = (x y : A) 7 isSet (x = y) 


In a groupoid, two points x and y might be equal in multiple ways, but there 
should be at most one equality between two paths p,q: x = y. For instance, 
the circle (on the left) is a groupoid 


Pp Pp 
x y w= Y 
C) 
qd qd 


but the sphere (on the right) is not a groupoid: between the point x and y there 
are two paths p and q and between those paths there are two non-homotopic 
paths (the deformations through the front or the back hemisphere). 


The hierarchy. Continuing in this way, we define the notion of n-type, or a type 
of homotopy level n, by recursion on n [Unil3, Chapter 7]: 
— a 0-type is a set, and 


— an (n+1)-type is a type such that the type x = y is an n-type, for every 
points x and y. 
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In particular, a 1-type is a groupoid. 

The intuition is that an n-type is a type which is trivial in dimensions higher 
than n, in the sense that it does not contain any non-trivial k-sphere for k > n. 
In low dimensions k, the k-spheres (or spheres in dimension k) can be pictured 
as follows: 


0-sphere 1-sphere 2-sphere 


A 0-sphere thus consists of two points, a 1-sphere is a circle and a 2-sphere is a 
traditional sphere. For instance, 


— a set (a 0-type) may contain two distinct points (a 0-sphere) but not a 
circle (a 1-sphere), 


— a groupoid (a 1-type) may contain distinct points or circles but no 2-spheres, 


and so on. 


Negative types. The choice of nm = 0 for sets is done in order to agree with 
traditional conventions in mathematics, but it can be extended a bit to negative 
numbers. We have seen that in a proposition is such that x = y is a 0-type (a 
set) for every pair of point x and y, so that it makes senses to define a (—1)-type 
as a proposition: if we adopt this convention, a 0-type is a type in which x = y 
is a (—1)-type, in accordance with the above definition. 

Can we also make sense of a (—2)-type? In a (—1)-type, i.e. a proposition, 
for every pair of points x and y, we should have that 2 = y is a (—2)-type. Since 
in a proposition every pair of points is related by a unique path, a (—2)-type can 
be defined as a contractible type, i.e. a type which is a point up to homotopy, 
see below. If we go on with this reasoning, we find that a (—3)-type should still 
be a contractible type, so that we stop at dimension n = —2. 


Contractible types. In Agda, the predicate of being contractible for a type can 
be defined as 


isContr : V {i} + Type i 7 Type i 
isContr A=ZLA (A x > (y: A) 4 x = y) 
It expresses the fact that a type is contractible when it contains a point x such 


that for every point y there is a path p, from x to y. For instance, the type T 
is contractible since every point of it is equal to the only constructor tt: 


T-isContr : isContr T 

T-isContr = tt , (A { tt 7 refl }) 

Once again, it might seem that the circle is contractible because there is a 
path between any two pair of points, but it is not so because the choice of the 
path p, has to be made continuously in y, which is not possible for the circle. 
A contractible type is thus homotopy equivalent to a point: 


. 


contractible contractible not contractible 
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Apart from T, an interesting contractible type is the singleton at a point x 
in a type A, which consists of all the points of A equal to x: 


Singleton : V {i} {A : Type i} 7 A > Type i 
Singleton {A =A} x=EZA (A yrxey) 


Such a type is always contractible: 


Singleton-isContr : V {i} {A : Type i} (x : A) 
isContr (LA (A y 4 x = y)) 
Singleton-isContr x = (x , refl) , A { (y , refl) 7 refl } 


Since a contractible type contains only one point up to homotopy, all its 
elements are necessarily equal, i.e. a contractible type is a proposition: 


Contr-isProp : V {i} {A : Type i} ~ isContr A ~ isProp A 
Contr-isProp (x , p) yz withpy | pz 
| refl | refl = refl 


(we will generalize this below when showing the cumulativity property). 


n-types in Agda. We can define a predicate hasLevel such that hasLevel (n+2) 
A holds when A is an n-type (we start at n = —2 instead of n = 0) by 


hasLevel : V {i} + N 7 Type i ~ Type i 
hasLevel zero A = isContr A 
hasLevel (suc n) A = (x y : A) 7 hasLevel n (x = y) 


Remark 9.3.3.1. Note that for a type A, being a (—1)-type according to the 
above definition (i.e. satisfying hasLevel 1) requires slightly more than the pre- 
vious definition of propositions: for every pair of points x and y, there should 
be a path p: x = y as before, but we should also show that for every other 
path q: x =y, we have p= q. However, the second requirement is automatic if 
we carefully chose paths so that the two definitions coincide: 


isProp-islType : V {i} 7 {A : Type i} 7 isProp A > hasLevel 1 A 
isProp-islType px y=! (px x) -pxy, 
A { refl + --inv-l (p x x) } 


(we are using the same trick here than for Hedberg’s theorem, see section 9.3.2). 


Cumulativity. We have seen in section 9.3.2 that a proposition is a set. More 
generally, following the same ideas, one can show that every n-type is an 
(n + 1)-type. This entails that the hierarchy of n-types is cumulative in the 
sense that an n-type is an m-type for every n < m. This is shown by induction 
on n. For the base case, we have to show that a contractible type A (ie. a 
(—2)-type) is also a proposition (i.e. a (—1)-type). Since A is contractible there 
is a point a in A and a path p, : a = x for every point x in A. In order to show 
that A is a proposition, we have to show that, for every points x and y in A, we 
have a path x = y: we can simply take pz! - py (and every other path g: 2 = y 
is easily shown to be equal to this one by induction on q). The inductive case 
is simple. Formally, 
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hasLevel-cumulative : V {i} {n : N} {A : Type i} 

hasLevel n A + hasLevel (suc n) A 
hasLevel-cumulative {_} {zero} (a , p) x y = 

! (px) - py,A{ refl > --inv-1 (p x) } 
hasLevel-cumulative {_} {suc n} L x y = hasLevel-cumulative (L x y) 


The property of being an n-type. One can show that the property of being an 
n-type is a proposition: a type either is an n-type or not, but there cannot be 
multiple ways in which a type is an n-type. 

For the base case, one has to show that being contractible is a proposition. 
Suppose given two proofs (x, p) and (y, q) that a type A is contractible, where x 
(resp. y) is a point of A and p (resp. qg) associates to every point z of Aa 
path x = z (resp. y = z). Showing that these two proofs are equal amounts to 
showing that « = y, which is given by p,, and that p = q. Assuming function 
extensionality, this last point is equivalent to showing that, for every point z 
in A, the paths p, : x = z and gq, : x = z are equal, up to some transport of 
the first. Since A is contractible (we have a proof (x,p) of it), it is a O-type 
(i.e. a set) by cumulativity, and therefore any two parallel paths in it are equal, 
thus p, = q:: 


isContr-isProp : V {i} {A : Type i} + isProp (isContr A) 
isContr-isProp {_} {A} (x« , p) (Y, q) = 
x-= (p y) (funext (A z > fst (A-isSet y z _ (q z)))) 
where 
A-isSet : hasLevel 2 A 
A-isSet = hasLevel-cumulative (hasLevel-cumulative (x , p)) 


The inductive case is handled immediately using function extensionality: 


hasLevel-isProp : V {i} {A : Type i} 

(n : N) + isProp (hasLevel n A) 
hasLevel-isProp zero = isContr-isProp 
hasLevel-isProp (suc n) f g = 

funext2 (A x y 7 hasLevel-isProp n (f x y) (g x y)) 


9.3.4 Propositional truncation. We would now like to construct an oper- 
ation, called propositional truncation, which turns an arbitrary type A into a 
proposition ||Al|, as detailed in [Unil3, section 3.7]. The intuition is that if 
a term of type A is a particular proof that A holds, a term of type ||Al| is a 
witness that there exists a proof for A, but does not contain the information 
of an actual proof. Therefore, the type ||A|| should be empty when A is and 
a point otherwise. If A is decidable, this operation is easy to define: either A 
or 7A holds, and we respectively define ||A|| = T or ||A|| = L. However, since 
we do not live in a classical world, we cannot define propositional truncation in 
this way. A more faithful description is that the propositional truncation starts 
from the type A and adds a path between any pair of points in order to turn it 
into a proposition, see section 9.5.4. 


Rules. Propositional truncation is not a definable operation and has to be added 
as a new construction to the logic. We extend the syntax of expressions by 


ez=... | llell | llellisprop | lel | rece, e’, a + e”) 
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where 


— ||A]| is the propositional truncation of A, 


\|AllisProp is a proof that ||Al| is a proposition, and 
— |t| provides a proof that ||A|| is non-empty when there is a term t of type A, 
— rec(t, B,x +> u) is the eliminator for truncated types. 


The formation rules state that the propositional truncation ||A|| exists for every 
type A and is a proposition: 
TEA: Type (iin) [TE A: Type 
Jee iis 
T+ |Al| : Type TF ||Allisprop : isProp(A) 


({lllp) 


The introduction rule states that the propositional truncation || A|| is non-empty 


when A is 


acer 
ier, Ce Wi pang I 
TF i: [Al 


The elimination rule states that if we have an element of ||A||, then we can 
assume that we have an element of A provided that the type we are currently 
proving (or “eliminating into”) is a proposition: 
Tr E¢: |All T,a:Abu:B [+ P: isProp(B) 
TF rec(t, Byrn u):B 


(IIlle) 


The computation rule states that the element of A given by the elimination rule 
above is t when the witness given for || A]| is |t]: 


Trt:A T,z:ArFu:B [TF P: isProp(B) 
TF rec(|t|, By 24 u) = ult/z]: B 


(Illlc) 


The uniqueness rule is 


Tt: |Al ['F P: isProp(A) 
Tt |rec(t, A,r x)| =: |All 


(IIllv) 


Remark 9.3.4.1. For simplicity, we have given the rules in the non-dependent 
case, which is the most useful one in practice. For full generality, we should 
allow B to depend on ||A|| and adapt the rules accordingly. For instance, the 
elimination rule should be 


Trt: Al 
T,x: |All B T,a: Aru: Bia|/a] Ta: ||Al| / P: isProp(B) 
TF rec(t,24 ByaH u): Bit/z] 


(ll) 


Definition. This construction can be implemented in Agda, by postulating ax- 
ioms corresponding to the rules. Formation is 


postulate ||_|| : V {i} ~ Type i 7 Type i 
postulate ||||-isProp : V {i} {A : Type i} 7 isProp || A || 
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introduction is 
postulate |_| : V {i} {A : Type i} 7A] A ll 
elimination is 


postulate ||||-rec : V {i j} {A : Type i} {B : Type j} 7 
isProp B > (A +B) > (|| A || 7 BD 


computation is 


postulate ||||-comp : V {i j} {A : Type i} {B : Type j} > 
(P : isProp B) (f : A+B) (x: A) > 


llll-rec P f | x | = f x 

and uniqueness is 

postulate ||||-eta : V {i} {A : Type i} (P : isProp A) (x: || A |) 7 
| Illl-rec P id x | =x 


Logical connectives. Remember from section 9.3.1 that we had difficulties defin- 
ing the disjunction of propositions because the coproduct of two propositions is 
not a proposition in general (we can only show that it is a set). Now that we 
have the propositional truncation at hand, we can use it on order to squash the 
result of the coproduct into a proposition. We can thus define disjunction as 


_v_: V {i j} 7 Type i ~ Type j ~ Type (lmax i j) 
AVB=]|AuB | 


The disjunction of two propositions is now a proposition by definition. Similarly, 
the existential quantification is a truncated variant of %-types: 


3:V {i j} 7 (A: Type i) 7 (A > Type j) 7 Type (lmax i j) 
AAB=]|2AB ll 


The axiom of choice. In order to illustrate the difference between operations 
and their truncated variants, let us consider the possible implementations of the 
axiom of choice in type theory, see [Uni13, section 3.8]. Recall from section 5.3.2 
that, in set theory, a possible formulation of this axiom states that, given a 
relation R C A x B between sets A and B such that every element x of A is in 
relation with at least one element y of B contains a function. In type theory, 
the naive translation of this is the formula 


CAC : V {i j k} 7 Type (lmax (lmax (lsuc i) (lsuc j)) (lsuc k)) 
CAC {i} {3} {k} = {A : Type i} {B : Type j} 

(R : A +B Type k) 7 

(rr: (& :A72HBAyrHRx y)) > 

xX (A 7B) (A f > (x : A) AR x Cf X)) 


is called the constructive axiom of choice, or CAC, and we have seen in sec- 
tion 6.5.8 that this formula is easily proved. Namely, the argument of type 


(x : A) 7xXB(AyoRx y) 
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witnesses the fact that every element of A is in relation with some element of B. 
A term of this type is a function r which to every x € A associates a pair 
consisting of an element y € B together with a proof that the pair (x,y) is in 
the relation R. From this data it is easy to construct a function A > B (by 
post-composing r with the first projection) and a proof that we have (2, 7r(x)) 
in the relation R for every x € A: 


cac : V {i j k} 7 CAC {i} {j} {k} 
cac R f = (A x > fst (f x)) , (A x 7 snd Cf x)) 


In some sense this was “too easy”, because the function r directly provided us 
with a way to construct a suitable element of B from an element of A. 

A more faithful way of implementing the axiom of choice in type theory 
consists, instead of supposing that we have a function r as above, in only sup- 
posing the existence of such a function, i.e. that its propositional truncation is 
inhabited, i.e. we use an existential quantification instead of a U-type. Similarly, 
as a result, we only want to show that there exists a suitable function from A 
to B, without explicitly constructing it. The “right” formulation of the axiom 
of choice is thus: 


AC : V {i j k} > Type (lmax (lmax (lsuc i) (lsuc j)) (lsuc k)) 
AC {i} {3} {k} = {A : Type i} {B : Type j} > 

isSet A 7 isSet B 7 

(R : A+ B- Type k) 7 

C(x : A) (y : B) 7 isProp (R x y)) 7 

tr: (&& :A743AIBAyrRx y)) 7 

4 (A 7B) AA f 7 (x : A) FRX Cf x)) 


Note that, since we are serious about homotopy levels, we have also restricted 
to the case where A and B are sets and Rvy is a proposition for every element 
x of A and y of B (the axiom without this restriction would be inconsistent 
with univalence [Unil3, Lemma 3.8.5]). There is also a dependent variant of 
this axiom (where the type B is allowed to depend on A): 


DAC : V {i j k} 7 Type (lmax (lmax (lsuc i) (lsuc j)) (lsuc k)) 
DAC {i} {j} {k} = {A : Type i} {B : A + Type j} 7 

isSet A 7+ ((x : A) 7 isSet (B x)) 7 

(R : (x : A) 7B x > Type k) > 

C(x : A) (y : B x) 7 isProp (R x y)) 7 

(r: (« : A) 7d (BX) AyrRx y)) 7 

a (cx : A) 7B xX) Afr: AD AR xX CF x)) 


It can be shown that AC and DAC are equivalent (exercise: show it). Finally, 
these axioms are also equivalent to the following axiom 


PAC : V {i j} + Type (lmax (lsuc i) (lsuc j)) 
PAC {i} {j} = {A : Type i} {B : A 7 Type j} 7 
isSet A 7 ((x : A) 7 isSet (B x)) 7 
(x: ADF Bx WD > I Cx: A) 4B x) I 


which is close to the usual alternative formulation of the axiom of choice: a 
product of non-empty sets is non-empty, see section 5.3.2. 
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Diaconescu. We are now in position of formally proving Diaconescu’s theorem, 
which states that 


the axiom of choice implies the excluded middle. 


The traditional proof of this theorem was presented in section 5.3.3 and the 
reader is advised to read again the proof there before going on with current sec- 
tion, which we learned from [Alt19]. We suppose here that both function exten- 
sionality (see section 9.1.5) and propositional extensionality (see section 9.3.1) 
hold in this section, both being consequences of univalence. 

We take for granted that the following formulation of the axiom of choice 
holds 


PAC : V {i j} + Type (lmax (lsuc i) (lsuc j)) 
PAC {i} {j} = {A : Type i} {B : A 7 Type j} 7 
(x : A) FH Bx WD > I Cx : A) 4B x) I 


It can be remarked that we are not very serious about homotopy levels, i.e. we 
do not restrict to the case where A and the B x are supposed to be sets: adding 
this does not bring any interesting difficulty, but makes the proofs a bit longer 
and thus more difficult to read. We suppose fixed an arbitrary proposition P in 
Type i for some level i (here also, P should be taken to be a proposition if we 
were more rigorous) and our goal is to show 


PvP 
We write U for the set of non-empty subsets of booleans: 
U = X (Bool + Type i) (A Q > J Bool Q) 


An element of U consists of a subset Q of the booleans, encoded here as a 
predicate on booleans (Q b holds when a boolean b belongs to the set) together 
with a proof that the set is non-empty (Qb6 holds for some boolean b). In 
particular, this set contains two elements of interest for us: the set 


F = {be Bool |b=0V P} 


which is non-empty because it contains 0, formalized as 


Fo: U 
F = (A b-7b= false v P) , | false , | inl refl | | 
and the set 


T = {b€ Bool |b=1Vv P} 
which is non-empty because it contains 1, formalized as 


T: U 
T= (CA b7b=true v P)) , | true , | inl refl | | 


An element Q of U consists of a subset Q’ of Bool together with a proof Q” 
that Q’ is non-empty. The family consisting of all Q’ such that Q belongs to U 
is thus a family of non-empty sets and, by the axiom of choice, it is non-empty: 
we have a function f which to every element Q of U associates an element of Q’. 
We will prove in the function 
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dec : ((Q: U) 7 E Bool (fst Q)) +P Vv -P 


that this entails that PV —P holds, from which we will be able to conclude as 
explained above: 


Diaconescu : isProp P + PAC + P v 7 P 
Diaconescu prop ac = |||l-rec ||||-isProp dec 
(ac {A = U} {B = (AQ > Bool (fst Q))} (A Q > snd Q)) 


The crux of this proof is thus the function dec. It proceeds by case analysis 
on f F and fT: 


— if f F is true then true = false V P holds and thus P holds, 
— if f T is false then false = true V P holds and thus P holds, 
— if f F is false and f T is true then we can show that —P holds. 


The subtle case is the last one, when f F is false and f T is true, because this 
entails that false = false V P and true = true V P hold, from which we 
cannot extract information. However, we can show that —P holds in this case. 
Namely, suppose that P holds (we write x for its proof) and let us deduce L. 
Since P holds, by definition of F' and T we have F'b = Tb for every boolean b, 
thus Fb = Tb by propositional extensionality, and thus F = T by function 
extensionality: 


Fel oF S07 
FST = 
L-= 
(funext 
A { 
false + propext ||||-isProp ||||-isProp 
(CA _ 7 right x) , (A _ 7 right x)) ; 
true - propext ||||-isProp ||||-isProp 
(CA _ 7 right x) , (A _ 37 right x)) 
}) 


C[lll-isProp (transport (43 Bool) _ (snd F)) (snd T)) 


From there, we can deduce that the boolean of f F' is equal to the boolean 
of fT (recall that f Q is a pair consisting of a boolean and a proof that it 
belongs to Q): 


fF=fT : fst (f F) = fst (f T) 
fF=fT = ap (A Q > fst (f Q)) FST 


However, we know that those booleans are respectively false and true, and we 
can deduce that false = true 


absurd : P + (fst (f F) = false) + (fst (f T) = true) > 
false = true 
absurd x ff ft = transport2 _=_ ff ft fF=fT 


from which we conclude to an absurdity. The proof of dec is finally 
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dec : ((Q : U) 7 E Bool (fst Q)) + PV -P 
dec f with inspect (f F) | inspect (f T) 


dec f | (true , p) , _]| (true ,q),_= 

lII|-rec |Il-isProp (u-elim (A ()) (A x 7 | inl x |)) p 
dec f | (true , p) , _ | (false , q) , _= 

lII|-rec |Il-isProp (u-elim (A ()) (A x 7 | inl x |)) p 
dec f | (false , p) , _ | (false , q) , _= 


lII|-rec |lll-isProp (u-elim (A ()) (A x 7 | inl x |)) q 
dec f | (false , p) , k | (true ,q),1= 
| inr (A x + case absurd x (ap fst k) (ap fst 1) of AQ) | 


Note that we don’t directly match f F because we would lose the fact that 
the result of the match is equal to f F (and similarly for f T). Instead, we use 
inspect, which is defined by 


inspect : V {i} {A : Type i} (x: A) 7LA Ayrxrxey) 
inspect x =x , refl 


and allows retrieving both the result of the match and the equality with the 
matched value (using the terminology from section 9.3.3, this function returns 
an element of the singleton at x). 


Revealing truncation. As explained above, propositional truncation erases proofs, 
keeping only the existence of a proof. However, sometimes knowing the exis- 
tence of a witness is enough to reconstruct this witness [Esc19]. For instance, 
suppose that we are given a function f : N— N and we know that this function 
admits a root (i.e. a number n such that f(n) = 0), then we can actually con- 
struct root of f: we compute f(0), f(1), f(2), and so on, until we find a natural 
number n such that f(n) = 0. The point is that knowing the existence of the 
root ensures that this process will eventually terminate. This can be formalized 
and we are going to prove 


(A(n : N). f(m) = 0) > (2(n: N).f(m) = 0) 


or, unfolding the notations, 


X(n: N).f(n) = 0|| > U(n: N).f(n) = 0 


We are thus able to extract a witness from knowing its existence. Note that 
the fact that N can be enumerated is crucial here: the implication ||A|]| > A 
does not hold in general, for an arbitrary type A. For instance, if f was of type 
(N > N) —N, we would not expect to be able to construct a root from knowing 
its existence, because the type of functions N > N is not countable. 

So, suppose that we have a proof E of A(n : N).f(n) = 0 and we want 
to prove the proposition R which is U(n : N).f(n) = 0. We cannot directly 
provide the required natural number n (we cannot magically guess the root) 
and we cannot use the hypothesis FE: in order to do so, we would have to use 
the eliminator for propositional truncation, which we cannot do because the 
goal we are proving is not a proposition. Namely, the type R is a set, the set of 
all roots of f, and not a proposition (f might admit multiple roots). However, 
we can take a variant of this type in order to have a proposition: instead of 
constructing any root of f, we are going to construct a particular one, say the 
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smallest one. Namely, the set R’ of natural numbers which is a smallest root 
of f contains exactly one element (the smallest root of f) and will thus be a 
proposition. We can prove it by using elimination of propositional truncation 
on £ and then conclude that we have an element of R because we have the 
implication R’ > R (a smallest root of f is a root). 

In Agda, we are going to reason on an arbitrary predicate P on natural 
numbers, our above example being the particular case where Pn is f(n) = 0. 
We can define a predicate isFirst such that a natural number n satisfies isFirst n 
when n is the smallest natural number for which P holds: 


isFirst : V {i} (P : N 7 Type i) 7 N > Type i 
isFirst Pn=Pnx ((m: N) 7 Pm-n <€ m) 


Moreover, using antisymmetry of the order on natural numbers, two smallest 
numbers satisfying a property are equal (in other words, the smallest natural 
number to satisfy a property is unique, when it exists): 


isFirst-= : V {i} (P : N7> Type i) 7 {m 

isFirst P m7 isFirst Pn 7 

isFirst-= P {m} {n} (Pm , Fm) (Pn , Fn) 
S-antisym (Fm n Pn) (Fn m Pm) 


: NBo 


Bn 


las 


Using this, and the closure of properties of propositions under conjunction and 
II-types, we can show that if P is a predicate on natural numbers, in the sense 
that Pn is a proposition for every natural number n, then the type of first 
natural numbers to satisfy this predicate is a proposition: 


first-isProp : V {i} (P : N + Type i) 7 ((n : N) 7 isProp (P n)) > 
isProp (2 N (isFirst P)) 
first-isProp P prop = 
x-isProp 
(A n 7 A-isProp 
(prop n) 
CII-isProp (A n 7 Il-isProp (A Pn 7 S-isProp)))) 
(A mn 7 isFirst-= P) 


Next, our goal is to show that if we know an arbitrary natural number m 
satisfying a predicate P then we can construct the smallest one. In order to 
perform inductions, it will be useful to consider the type of the smallest natural 
number greater than a fixed number & satisfying a proposition: 


isFirst-from : V {i} > N7 (P : N-> Type i) 7 N > Type i 
isFirst-from k P n = isFirst (An7k<¢nxPo)n 


We will also use the following “downward” induction principle, which states that 
if we know that Pm holds and P(n+ 1) implies Pn for an arbitrary number 
n, then Pn holds for every n < m. Formally, it can be expressed as 


rec-down : V {i} (P : N~> Type i) (m: N) > 
Pm-7> (tn: N)747n<m-P (suc n) + Pn) 7 
(n:N)7nSm-7Pn 


and its proof is left as an exercise to the reader. Suppose given a decidable 
predicate P (i.e. PnV-(P n) holds for every n), for which we know a number m 
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such that Pm holds. By downward induction on k < m, we can construct the 
smallest number greater than k satisfying P. For the inductive step, if we know 
the smallest one n greater than k +1 then the smallest one greater than k is k 
if Pk is satisfied or n otherwise (we need to be able to decide Pk to be able to 
perform this case analysis). Formally, 


find-first-from : V {i} (P : N + Type i) 7 
(Cn : N) 7 isDec (P n)) 7 
(m:N)7Pm- 
(k:N)7k<m-7XN (An @ isFirst-from k P n) 
find-first-from P dec m Pm k k<m = 
rec-down 
(Ak +72 N (An @ isFirst-from k P n)) 
m 
(m , (S<-refl , Pm) , (A { n (msn , Pn) > mSn })) 
ind 
k ks<m 
where 
ind: (k:N) 7k<m- 
XN (A n > isFirst-from (suc k) Pn) + 
XN (A n> isFirst-from k P n) 
ind k k<m (n , Pn) with dec k 
ind k k<m (n , (k+1Sn , Pn) , Fn) | inl -Pk = 
n , (S-trans (ns<1+n k) kt+1Sn , Pn) , 
A { i (ksi , Pi) 7 
case split-< k<i of A { 
(inl k=i) 7 L-elim (-Pk (transport P (sym k=i) Pi)) ; 
(inr k<i) + Fn i (k<i , Pi) 
} 
} 
ind k k<m _ | inr Pk = 
k , (S-refl , Pk) , A { n (kSn , Pn) 7 k&n } 


where split-< is 
split-< :{mn:N}7m<n7>7menum<n 


(proof left to the reader). We can thus construct the first natural number n 
satisfying P, by applying previous lemma to the case k = 0: 


find-first : V {i} (P : N 7 Type i) 7 ((n : N) 7 isDec (P n)) 7 
(m:N) 7>Pm>XUN (An @> isFirst P n) 
find-first P dec m Pm with find-first-from P dec m Pm @ z<n 
find-first P dec m Pm | n, (_, Pn) , Fn = 
n , (Pn , An Pn Fn n (zn , Pn)) 


It is now time to return to our original problem. Given a function f : NN 
for which we have a proof E that d(n: N).f(n) = 0. We can use the elimination 
principle for propositional truncation in order to show X(n : N).isFirst(f(n) = 0), 
which is a proposition, and we are left with showing 


(X(n: N).f(n) = 0) > U(r: N). isFirst(f(n) = 0) 
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i.e. knowing a root of f we have to construct the smallest one, which is precisely 
the purpose of our find-first function above: 


extract-first-root : (f : N7N) 7 
JN (An fn = zero) + 
XN (CisFirst (A n> f n = zero)) 
extract-first-root f E = 
IlI|-rec 
(first-isProp P (A n 7 N-isSet (f n) Q)) 
(A { (n , Pn) 4 find-first P (A n> f n 2) n Pn}) 
E 
where 
P : N 7 Typeg 
P n= fn = zero 


Finally, we can conclude with our root extraction procedure: 


extract-root : (f : N7N) 7 

AN (A n> f n= zero) 7 

XN (A n> f n = zero) 
extract-root f E with extract-first-root f E 
extract-root f E | n, Pn, _=n, Pn 


Relationship with double negation. Given a type A, the type ——A is a propo- 
sition (as is the negation of any type) and there is a canonical map from the 
former to the later: 


aa-trunc : V {i} {A : Type i} 7 A 7-7 (7 A) 
aa-trunc x k =k x 


In this sense, double negation is very similar to propositional truncation, ex- 
cept that the resulting type is “classical” in the sense that it satisfies the law of 
elimination of double negation (or, equivalently, the excluded middle). If propo- 
sitional truncation ||A|| can be seen as a quotient of A (we identify all proofs), 
and ——A can be thought of as a further quotient, making the type classical. 
This quotient is witnessed by the existence of a canonical function || A|| + ——A, 
which can be constructed by 


IIIIl-a-: V {i} {A : Type i} > ||] Al] 73274 
\|I|--- = |Il|-rec --isProp (A x 7x 7 =x x) 


In general, there is no converse map. In particular, for a proposition A, the 
existence of such a map is equivalent to the type being “classical”, i.e. satisfying 
the elimination of double negation: 


aa-|||| : V {i} {A : Type i} 7 isProp A 7 
(GAD FIA ID &# G GA) FA) 
as-|[]| PA = (A f 37a > |[[||-rec PA id (f 77a)) , 
(A nne -37-a 4 | nne -7a |) 


Thus, if we assume that the logic is classical, in the sense that every proposi- 
tion satisfies NNE, propositional truncation can be defined as double negation, 
see [KECA16] and [Uni13, Exercise 3.14]. 
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Impredicative definition. Instead of defining propositional truncation axiomati- 
cally, we can almost encode it in the following way [Unil3, Exercise 3.15]: 


lll : V {i} (A: Type i) + Type i 
|_| {1} A = {B : Type i} + isProp B ~ (A +B) 7B 


The propositional truncation of a type A is a type ||A|] which, by definition, is 
such that, for every proposition B, if we have a map A > B then we have a 
map ||A|| > B, i.e. satisfies the elimination rule (||||z) stated earlier. We say 
“almost” here because the above definition is not accepted by Agda: if A is a 
type at level z then the type we have defined is not at level 2 but at level 1+ 1, 
i.e. we should actually have given it the type 


\|_ll : V {i} (A: Type i) 7 Type (lsuc i) 


There are two ways out of it. The easy one is to simply disable universe checking 
(with the option --type-in-type), but this makes the logic inconsistent, see 
section 8.2. The other one is to adopt a principle weaker than having type in 
type, called propositional resizing, which roughly says that a proposition in the 
i-th universe can be seen as a proposition in the j-th universe for any 7 and j 
(including 7 > 7): after all, a proposition contains at most one element (up to 
homotopy), so that it is reasonable to consider that size does not matter in this 
case. 

Anyhow, with this encoding, function extensionality allows proving that the 
truncation is a proposition 


|| |-isProp : V {i} {A : Type i} > isProp || A || 
\|||-isProp = II'-isProp (A B > Il-isProp (A p > Il-isProp (A f 7 p))) 


the truncation is easy to define 


|_| : V {i} {A : Type i} 7A 1A | 
jex. [fie tx 


the recursion principle is simple to show 


IIl-rec : V {i j} {A : Type i} {B : Type j} > 
isProp B 7 (A 7B) >] A || 7B 
lIII-rec pf x =xpf 


as well is the computation principle 


Ill-comp : V {i j} {A : Type i} {B : Type j} 7 
(p : isProp B) (f : A7B) (x: A)? 
lIII-rec p f | x | = f x 

\|||-comp p f x = refl 


and the uniqueness principle 


|| |-eta : V {i} {A : Type i} (p : isProp A) (x: || A |) > 
| |lll-rec p id x | =x 
||I|-eta p x = funext2 (A a f 7 p (f (& p id)) (x a f)) 
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9.4 Univalence 


As indicated before, we still lack ways to prove equalities which ought to hold 
in our geometric model. We now introduce the univalence axiom, due to Vo- 
evodsky, which fixes this in a satisfactory way. 


9.4.1 Operations with paths. In this section, we describe some operations 
involving paths, which will be useful in order to formulate and study univalence. 


Application. The first one, called ap, states that all functions preserve paths: 
given a function f: A > Banda path p: x = y in A, we can construct a path 
f(x) = f(y) in B, sometimes abusively written f(p), by “applying” (thus the 
name) f to p: 


ap : V {i j} {A : Type i} {B : Type j} {x y: A} 7 
(f : AFB F7xeyrf xefy 
ap f refl = refl 


It can also be seen as a witness for the fact that equality is a congruence 
(and we have already met this function under the name cong in section 6.6). 
This application is compatible with concatenation of paths in the sense that 


iq) =f@)ra@ 


--ap : V {i j} {A : Type i} {B : Type j} {x yz: A} > 
CPs Bae Ap CS) a (Gey zr 
ap f (p> q) = ap fp- apfq 

--ap f refl q = refl 


Similarly, if two functions are equal and we apply them to the same argument, 
the results will also be equal: 


happly : V {i j} {A : Type i} {B : A + Type j} 
{f g: (x: A) 7B x} > 
fegrtx:ArF xX 

happly refl x = refl 


g xX 


Transport. Given a type A, a family of types B : A > Type can be thought 
of as a family of spaces B(x), indexed by «x in A, which varies continuously 
in z. As an illustration, we have figured the type A below as a segment at the 
bottom, and the type B(x) above each point x of A as a disk. In passing, the 
space above, consisting of all the spaces B(x), thus depicts the type (a : A).B. 
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Since the spaces B(z) vary continuously in z, given a path p: 2 = y in A and 
a point a in B(x), if we make z evolve from x to y, the spaces B(z) will evolve 
from B(x) to B(y) and the point a will induce a path from a to some point 
in B(y). We call transport the operation which to every path p: x = y and 
point a in B(x) associates the point b in B(y) resulting from “transporting” the 
point a in B along the path p. Formally, it can be defined as 


transport : V {i j} {A : Type i} {x y : A} (B:: A> Type j) 7 
X=EyrBx7By 
transport B refl x = x 


This can also be seen as the fact that equality is substitutive, meaning that, 
in a type, we can replace an element by an equal one, and we have already 
encountered this function under the name subst in section 6.6. 

This transport function allows us to define a coercion function as a particular 
case: if two types A and B are equal (witnessed by a path p) then we can always 
transform an element of type A into an element of type B by transporting an 
element of A into an element of B in the family where the indexing type is Type, 
with the type A above each point A of Type: 


a 


Formally, 


coe : V {i} {A B: Type i} 7 (A=B) 7A7B 
coe p xX = transport (A A 7 A) p x 


Of course, it could also be defined directly by induction by 


coe : V {i} {A B: Type i} 7 (A=B)7A7B 
coe refl x = x 


Finally, we can define a variant of ap in the case where f is a dependent function, 
i.e. its type is of the form II(x : A).B(x). Given such a function and a path 
p:«x = y in A, we cannot expect to have f(x) = f(y) anymore because this 
type does not even make sense: f(x) belongs to B(x) and f(y) belongs to B(y), 
so that we cannot compare them for equality. What we can show however is 
that if we transport f(x) along p in B(y) then the resulting element of B(y) is 
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equal to f(y): 


The intuitive reason for this is that f has to be a continuous function from A 
to B. Formally, 


apd : V {i j} {A : Type i} {B : A ~ Type j} {x y : A} 7 
(f : (x : A) 7B x) (p:x=y)- 
transport B p (f x) fy 

apd f refl = refl 


mot 


9.4.2 Equivalences. We consider that two spaces are equivalent when they are 
“isomorphic up to homotopy”, i.e. they are homotopy equivalent, in the sense 
defined in section 9.2. We now formalize this notion, see [Unil3, Chapter 4] for 
details. We will see that it behaves much like the notion of isomorphism. 


Quasi-invertibility and homotopy equivalences. Recall that two functions f and 
g of type A > B are homotopic, what we write f ~ g, when f(x) = g(x) for 
every point x of A. Formally, this can be defined as 


_~_ : V {i j} {A : Type i} {B : A > Type j} 
(f g : (x: A) 7B x) 7 Type (lmax i j) 
—a~_fgt=Vxrf x2gx 


Also recall that a function f : A > B is a homotopy equivalence when there 
exists a function g: B + A such that go f ~ id, and fog ~ idg. This suggests 
defining the predicate isQinv such that isQinv(f) holds when f is a homotopy 
equivalence in this sense: 


isQinv : V {i j} {A : Type i} {B : Type j} 7 
(A > B) > Type (lmax i j) 
isQinv {A = A} {B = B} f = 
x (B7A) (A g- (g o f) ~ id * (f o g) ~ id) 


The name comes from the fact that, a function f satisfying this property is, in 
this context, said to be quast-invertible. Above, the identity is defined as 


id: V {i} {A : Type i} 7AA 
id x = x 


and the composition by 
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_o_ : V {i j k} {A : Type i} {B : Type j} {C : Type k} > 
(B72C) 7 AFB) 7AAC 
(go f) x = g (f x) 


Surprisingly, this definition turns out not to be a good one because it is not a 
proper predicate: isQinv(f) is not a proposition in general (see [Unil3, Theorem 
4.1.3] for a counter-example) and being a quasi-inverse is thus not a property 
of a function f, it involves more data. We can come up with a simple variant 
of this definition which actually is a predicate: instead of requiring that the 
left and the right inverse are the same, we leave the possibility for them to be 
different. We say that a function f : A > B is an equivalence when there exists 
g: B- Aand g’: B- A such that go f ~ id, and f og’ ~ idg. In Agda, 


isEquiv : V {i j} {A : Type i} {B : Type j} > 
(A + B) > Type (lmax i j) 
isEquiv {A = A} {B = B} f = 
2 AB AeA) G8 Ce OT 1d 
E(B > A). Ag > (f-0'g) ~ id) 


and one can show that isEquiv(f) is a proposition for every function f [Uni13, 
Theorem 4.2.13]. Note that every quasi-invertible map is canonically an equiv- 
alence: 


isQinv-isEquiv : V {i j} {A : Type i} {B : Type j} {f : A 7 B} > 
isQinv f + isEquiv f 
isQinv-isEquiv (g , gf , fg) = (¢ , ef) , (@, fg) 


There is also a converse map (which is not obvious to define), the subtle point 
being that the resulting pair of maps does not form an equivalence. That being 
said, all the equivalences we will construct in practice will be quasi-inverses. 


Contractibility. The notion of equivalence can be thought of as an “up-to- 
homotopy” version of the notion of bijection in set theory. We can therefore 
try to mimic the usual characterization of bijections: a function f: A> Bisa 
bijection when every element y in B has a unique preimage under f, i.e. f~'(y) 
is a singleton. In homotopy type theory, the analogue of the notion of preimage 
is given by the fiber of f at y which is the space of points x in A equipped with 
a path from f(x) to y: 


fib : V {i j} {A : Type i} {B : Type j} 7 
(A + B) + B ~ Type (lmax i j) 
fib {A =A} f y= LA AxAFx =z y) 


We then say that a map is contractible when all its fibers are: 


isContrMap : V {i j} {A : Type i} {B : Type j} 7 
(A + B) + Type (lmax i j) 
isContrMap {B = B} f = (y : B) 7 isContr (fib f y) 


It can be shown, for a map /f, that the types isEquiv(f) and isContrMap(f) are 
equivalent, so that we could use contractibility as an alternative definition for 
being an equivalence. 
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Equivalence of types. Two types A and B are equivalent when there is an equiv- 
alence from A to B, what we write A ~ B: 


_=_: V {i j} (A: Type i) (B : Type j) 7 Type (lmax i j) 
A=B=2Z (A > B) isEquiv 


This relation is an equivalence relation. It is reflexive: 


=-refl : V {i} {A .: Type i} 7AAA 
=-refl = id , (id , (A x 7 refl)) , Cid , A x > refl) 


transitive: 


=-trans : V {i j k} {A : Type i} {B : Type j} {C : Type k} 7 
A=BrB=C7AFzC 
SSL anS CF” po RR get) CE cg Te DD hg Gg ah oR” 9 a = 
(ho f) , 
(((go i) , Ax trans (ap g (ih (f x))) Caf x)) , 
((g' o i') , A x 4 trans (ap h (fg' (i' x))) Chi' x))) 


but also symmetric, which is not obvious because the definition of equivalence 
is not: 


=-sym : V {i j} {A : Type i} {B : Type j} >A =BrABAA 
=-sym {B = B} (f , (g , gf) , (g' , fg')) = 

g, (f , left) , (fF , gf) 

where 

g-g' : (x: B)+gxFg' x 

g-g' x = trans (sym (ap g (fg' x))) (gf (g' x)) 

left : (x : B) + f (g x) =x 

left x = trans (ap f (g-g' x)) (fg' x) 


An equivalence e consists of a map f : A > B together with two maps 
g,g : B — A which are respectively left and right inverse for f. We can define 
a function which to such an equivalence associates the corresponding f: 


=-4 : V {i j} {A : Type i} {B : Type Jj} 7>A=BrAAZAB 
2: St 


and one associating the corresponding g: 


=-+ : V {i j} {A : Type i} {B : Type j} *>A*=BrABHA 
Br Ot AOR 9 a) pe =e 


It will also be useful to have a notation for the proof that g is a left inverse 
for f, ie. « = g(f(x)) for every x in A: 


=-y : V {i j} {A : Type i} {B : Type j} 
(e : A= B) (x: A) 7 x = t+ e (F-7 e x) 
=n (f , (g , gl) , (h, hr)) x = sym (gl x) 


We also define one providing a proof that g is a right inverse for f, i.e. we have 
f(g(«)) = x for every x in A: 
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=-e : V {i j} {A : Type i} {B : Type j} 
(e: A=B) Wy: B) +t 74 e (Fey) Fy 


Sse (fF, (8D 4 Ch hr) y= 
f (g y) =( sym (ap (A y > f (g y)) Chr y)) ) 
f (g Cf Ch y))) = ap f (gl Ch y)) ? 
f (h y) =( hr y ) 
y i 


Note that the proof is slightly more complicated than the previous one because 
we show here that g and not g’ is a right inverse for f. 

Finally, we show a last useful theorem. In set theory, a function f: A> B 
which is bijective, i.e. which admits an inverse g, is always injective. This means 
that for every elements x and y of A, if f(x) = f(y) then x = y. Namely, we 
have 


z= g(f(x))=9(f(y)) =y 


This property also holds in our context: 


=-inj : V {i j} {A : Type i} {B : Type j} 
(e : A= B) {x y: A} 77 ex zt Fe yrXxe y 
=-inj e {x} {y} p 
Xx 
SEO VE (FS eX 
=-t ea (=-4 ey 
y 


{ =n ex ) 
Cap (Gee) p 2 
( 


) 
) sym (=-y e y) ) 


9.4.3 Univalence. We can easily define a function which shows that two equal 
types A and B are equivalent: 


id-to-equiv' : V {i} {A B : Type i} + (A = B) > (A = B) 

id-to-equiv' refl = id , (Cid , (A _7 refl)) , id , (A _ > refl)) 

In words, by induction on the equality p: A = B we can suppose that A and B 
are the same, and in this case we can take the identity function as equivalence 
between the two types, left and right inverses being the identity. Given a path 


p: A= B, note that the induced function A > B is precisely given by coe f, 
so that it is conceptually better to define this operator as 


id-to-equiv : V {i} {AB : Type i} 7 (A = B) > (A = B) 
id-to-equiv p = coe p , coe-isEquiv p 


where the proof that coercion gives rise to equivalences is 


coe-isEquiv : V {i} {A B : Type i} (p : A = B) 7 isEquiv (coe p) 
coe-isEquiv refl = (id , (A x 7 refl)) , Cid , A x > refl) 


The univalence axiom introduced by Voevodsky states that this function is 
itself an equivalence [Unil3, section 2.10]: 


postulate univalence : V {i} {A B : Type i} 7 
isEquiv (id-to-equiv {i} {A} {B}) 


i.e. we have an equivalence 
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ua-equiv : V {i} {AB : Type i} 7 (A = B) = (A = B) 
ua-equiv = id-to-equiv , univalence 


One of the main consequences of this axiom is that, since the types A = B and 
A~ B are equivalent, there is a map 


A~Bo>A=B 
which allows constructing a proof of equality from an equivalence: 


ua : V {i} {A B : Type i} > (A = B) > (A = B) 
ua f = =-+ ua-equiv f 


This map can be seen as the proper introduction rule for equality, the elimination 
rule being id-to-equiv. The associated computation rule is 


ua-comp : V {i} {A B : Type i} (e : A = B) > coe (ua e) = (fst e) 
ua-comp {A = A} {B = B} e = ap fst (*-e ua-equiv e) 


and uniqueness rule is 


ua-ext : V {i} {AB : Type i} {p : A = B} > p = ua (id-to-equiv p) 
ua-ext {p = p} = =-n ua-equiv p 


Note that when A and B are types at level i, the type A ~ B is also at 
level i whereas A = B is at level i+1. It is therefore crucial that we allow 
equivalences to hold between types at different levels, which is why we really 
had to properly take care of universe levels in the developments in this chapter. 


9.4.4 Applications of univalence. The way univalence is quite often used is 
the following. It may happen that we have two different descriptions A and A’ 
of a same data. In this case, these types can be shown to be equivalent and 
thus equal by ua. Since they are equal they can be used interchangeably: by 
transport, we can always convert a property on one into a property on the other. 

For instance, the coproduct type AU B can alternatively be defined as the 
type 

u(b: Bool).d4,p b 

where 64,8 : Bool — Type is the function such that 64,5 false = A and 
d4,p true = B. This means that we can describe an element of AU B as a 
pair (b,z) where } is a boolean and z is an element of A (resp. B) when A is 
false (resp. true). An equivalence 


(AU B) © (3(b : Bool).54,p b) 


is easily constructed, from which we can deduce 


(AU B) = (Z(b: Bool).d4,5 b) 


meaning that we can convert any property on one representation into a property 
on the other representation. Similarly, the type A x B can be described as the 
type 

(A x B) = (II(6: Bool).64,p b) 

As a more programming-oriented example, natural numbers can either be 
defined in unary or binary representation, giving rise to equivalent types. By 
univalence, we can automatically transport any operation on one representation 
(e.g. addition) into the other. 


CHAPTER 9. HOMOTOPY TYPE THEORY 446 


9.4.5 Describing identity types. Using univalence, we can describe the iden- 
tity types for most type constructions. 


Identity types in products. Given types A and B, we expect that a path in Ax B 
consists of a pair of paths in A and B respectively, i.e. given x, x’ in A and y, y’ 
in B, we should have 


Idaxp((z, y), (@'; y')) = Id4(z, 2’) x Idp(y,y’) 


By univalence, this amounts to showing that the corresponding equivalence 
between types 


Idaxp((z, y), (a, y')) > Ida(z, ) x Idp(y, y’) 
which is easily constructed: 


x-= : V {i j} {A : Type i} {B : Type j} {x y: A x B} > 
(x = y) = ((fst x = fst y) * (snd x = snd y)) 

MSP AKU A ce 
f , (¢ , A { refl > refl }) , (g , A { Crefl , refl) 7 refl }) 
where 
f:x=y- (fst x = fst y) x (snd x 
f refl = refl , refl 
g: (fst x = fst y) x (snd x = snd y) 7x zy 
g (refl , refl) = refl 


snd y) 


Identity types over natural numbers. For data types, similar characterizations 
can be achieved. For instance, for natural numbers, we expect that there is one 
proof of equality in Idy(n,n) for any natural number n and none in Idy(m,n) 
for m # n. In other words, we expect Idy(n,n) = T and Idy(m,n) = L for 
m #n. We can therefore code the expected type for identity types between any 
two natural numbers as 


code : N7 N > Typeg 

code zero zero =T 

code zero (suc n) = L 

code (suc m) zero 1 

code (suc m) (suc n) = code mn 


By univalence, in order to show that natural numbers have the expected identity 
types, it is enough to show that there is an equivalence 


Idn(m,n) ~ code mn 
To this aim we define an encoding function 


enc : {mn : N} >m=n- codemn 
enc {zero} {.zero} refl = tt 
enc {suc n} {.(suc n)} refl = enc {n} {n} refl 


and a decoding function in the other direction 


CHAPTER 9. HOMOTOPY TYPE THEORY 447 


dec : {mn : N} > codemn+7men 
dec {zero} {zero} tt = refl 
dec {suc m} {suc n} c ap suc (dec c) 


and finally show that they form an equivalence: 


N-eq : (mn: N) 7 (m=n) = code mn 


N-eq mn = 
enc , ((dec , dec-enc) , (dec , enc-dec {m})) 
where 
dec-enc : {mn : N} + (p : m= n) 7 dec (enc p) =p 
dec-enc {zero} {.zero} refl = refl 


dec-enc {suc m} {.(suc m)} refl = ap (ap suc) (dec-enc refl) 
enc-suc : {mn : N} > (p : m =n) 7 enc (ap suc p) = enc p 
enc-suc refl = refl 
enc-dec : {mn : N} 7 (c : code mn) ~ enc (dec {m} c) =c 
enc-dec {zero} {zero} tt = refl 
enc-dec {suc m} {suc n} c = 

trans (enc-suc (dec {m} {n} c)) (enc-dec {m} {n} c) 


9.4.6 Describing propositions. In this section, we use univalence to show 
that a proposition is either L or T. First, we expect that | is the only empty 
type, i.e. that for every type A such that —A holds, A = L. By univalence, this 
amounts to showing ~A > (A ~ 1), which is easily done: the map A > is 
given by the argument =A and the map L —- A is given by the elimination of 
1. In Agda, 


--=-1 : V {i} {A : Type i} 77 AVFAZFL 
a-=-lLk =k, 
(L-elim , A x > L-elim (k x)) , 
(L-elim , A x > L-isProp _ _) 


Similarly, T is the only contractible type, in the sense that any contractible type 
is equivalent to T: 


Contr-=-T : V {i} {A : Type i} 7 isContr A>*7A==T 
Contr-=-T {A = A} (x , p) = 
P(E AY OPW, Sis AOE tt + refl 3) 


where 

f: AAT 
f_=tt 
g:TwA 
g_=x 


Moreover, any non-empty proposition is contractible: 


aProp-isContr : V {i} {A : Type i} + isProp A 7 A = isContr A 
aProp-isContr PA x = x , (PA x) 


From there, one easily deduces that a proposition is either | when empty 


Prop-=-L : V {i} {A .: Type i} 7 isPropA?t7AVFARAL 
Prop-=-L PA k = --=-1 k 
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or T when non-empty 


Prop=*-Ts°¥ (ip {As Type 1} 3 isProp.A\ A.A 27 
Prop-=-T PA x = Contr-=-T (aProp-isContr PA x) 


Decidable propositions. Above, since the logic is not classical, we need to be 
provided with a proof of =A or a proof of A in order to decide whether the 
proposition A is | or T. But it is not the case that every proposition is either 
t or T, i.e. that Prop = Bool. However, this does hold for propositions which 
are decidable: 


dec-Prop : V {i} + © (Type i) (A A = isProp A A isDec A) = Bool 


(the proof is left as an exercise to the reader). By univalence, the above equiva- 
lence can be turned into an equality, thus providing a conceptually much better 
definition of booleans than the type with two elements: booleans is the type of 
decidable propositions! 


Truncation of propositions. Finally, we mention that propositional truncation 
is idempotent on propositions, meaning that ||A|| = A when A is a proposition. 
By univalence, this amounts to showing ||A|| ~ A: the map ||A|| > A is given 
by elimination of truncation and the map A = ||A|| is truncation. 


trunc-prop : V {i} {A : Type i} + isProp A> || A || =A 
trunc-prop P = 

lIll-rec P id , 

Cl-l, A — > Illl-isProp _ _) , 

Cl-l, A_7P ) 


Describing contractible types. In a similar way as we have been able to describe 
all propositions as being either or T, we can of course characterize contractible 
types as being T. Namely, one can show that any contractible type is equivalent 
to T: 


Contr-=-T : V {i} {A : Type i} 7 isContr A>*A=T 
Contr-=-T {A = A} (x , p) = 
Pcl oy a Pia 5 Big Ltt a rer yy) 


where 

f: AZT 
fe SEE 
g:Tw7aA 
g_=x 


from which any contractible type can be shown to be equal to T by univalence. 
In other words, T is the only contractible type. 


9.4.7 Incompatibility with set theoretic interpretation. The axiom of 
univalence forces types to act as spaces. If we assume this axiom, then it cannot 
be the case that we think of them as spaces but they are secretly sets: some 
of them have to exhibit non-trivial geometric structure. In order to show this, 
we shall first show that there is at least one type which is not a set: Type. 
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Namely, we are going to show that it contains an element, namely the type 
Bool of booleans, which has a non-trivial loop (i.e. a path from Bool to Bool), 
whereas in a set every loop has to be equal to the identity path. 


A non-trivial path. Consider the negation operator not : Bool — Bool on 
booleans, which sends false to true and vice versa. This function is easily shown 
to be involutive: applying negation twice gets us back to the boolean we started 
with. 


not-involutive : (b : Bool) + not (not b) = b 
not-involutive false = refl 
not-involutive true = refl 


From there, we can show that negation induces an equivalence from boolean to 
themselves: 


not-= : Bool = Bool 
not-= = not , (not , not-involutive) , (not , not-involutive) 


Of course, we have seen that equivalence is reflexive, so that we have A ~ A 
for every type A, including A = Bool, but the equivalence Bool ~ Bool is 
non-trivial, in the sense that it exchanges false and true. By univalence, this 
equivalence will induce a path 


p: Bool = Bool 


which will not be the identity path. Geometrically, we can picture the situation 
as follows. The type Bool is a point in the space of all types, which contains a 
loop p on it induced by negation: 


Bool { ) Pp 


If we assume that Type is a set, then we will assimilate this path to the identity 
path, which will lead to a contradiction, because it will also force us to identify 
false and true, which we know is not the case: 


false#true : > (false = true) 
false#true () 


Namely, the function coe p: Bool + Bool transports a boolean along p, and the 
computation rule for univalence tells us that it is precisely negation. Now, if we 
assume that Type is a set, the path p will be equal to the path refl : Bool > Bool 
and therefore, we will have coe p = coe refl, i.e. the boolean negation function 
is equal to the identity. If we apply both to true (using happly), we get that 
false is equal to true, hence a contradiction. 


Type-isn'tSet : 7+ (isSet Typeg) 

Type-isn'tSet S = false#true ( 
false ={ happly (ap coe (S Bool Bool refl (ua not-=))) false ) 
coe (ua not-=) false =( happly (ua-comp not-=) false ) 
true 


) 
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Incompatibility of UIP with univalence. As an immediate consequence the unique- 
ness of identity proofs principle 


UIP : V {i} + Type (lsuc i) 
UIP {i} = {A : Type i} {x y: A} 47 (pq: xy) 7peq 


is inconsistent with univalence because it forces every type to be a set: 


-UIP : (V {i} 4 UIP {i}) 9 1 
-UIP uip = Type-isn'tSet (A x y > uip) 


Incompatibility of double negation elimination with univalence. For similar rea- 
sons, we cannot suppose that, for every type A, we have 


aAA- A 


see [Unil3, Theorem 3.2.2]. Intuitively, supposing this amounts to supposing 
that we have picked a particular element in every non-empty type, and this 
cannot reasonably be done in a continuous way. 

In more details, suppose that we have a function 


nne : II(A: Type)..7A > A 


and write f for nne Bool : == Bool — Bool. We can easily construct an element 
u of =—Bool (this amounts to showing that Bool is non-empty), from which 
we can construct an element b = fu of Bool and we can show that not b = 8, 
from which we are of course able to derive a contradiction. In order to show 
the equality not b = b, the main idea is, as before, to transport f along the 
non-trivial path p : Bool = Bool. The resulting function when applied to wu can 
be shown to be both equal to fu and not (fu). 


aNNE : (WV {i} (A: Type i) 97 (7 A) FAD FL 
=NNE nne = not-# (f u) (¢ 
not (f u) 
=( sym (happly (ua-comp not-=) (f u)) ) 
transport (A A 7 A) p (Cf u) 
=( ap (coe p) (ap f (--isProp u _)) ) 
transport (A A ~ A) p (nne Bool (transport (A A 7 - (- A)) (! p) u)) 
=( sym (happly (transport-7 p (A A 77> (7 A)) (AAA) f) u) ) 
transport (A A 7-> (7 A) 7A) pfu 
=( happly (apd nne p) u ) 
f um) 
where 
: 2 (> Bool) 
k = k false 
: 2 (- Bool) ~ Bool 
nne Bool 
: Bool = Bool 
= ua not-= 


TT AmACO SE 
iT] 


Another way to prove this consists in using Hedberg’s theorem presented 
in section 9.3.2. Namely, supposing that every type has the double negation 
property amounts to supposing that every type is decidable. In particular, any 
type should have decidable equality and thus be a set by Hedberg’s theorem. 
But we have shown above that Type is not a set, contradiction: 
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-NNE : (V {i} (A: Type i) 737 (FAD FPA FL 
-NNE nne = Type-isn'tSet (Hedberg (A x y > nne (x = y))) 


The above remark does not mean that univalence is incompatible with classi- 
cal logic. It simply means that double negation elimination should be restricted 
to propositions if one wants to use this as an axiom, see section 9.3.1. 


9.4.8 Equivalences. Univalence makes equivalences behave like equalities. We 
show here two instances of this which will be useful when proving function 
extensionality in section 9.4.9. 

Firstly, when we have a function f : A — B and an equality « = y be- 
tween elements x and y of A, we have seen that we have an induced equality 
f(x) = f(y) by the function ap of section 9.4.1. A similar property can be 
shown for equivalences: 


=-ap : V {i j} {AB : Type i} (f : Type i ~ Type j) 7 
A=B7fA=fB 
=-ap f e = id-to-equiv (ap f (ua e)) 


Secondly, the J rule presented in section 9.1.3 states that in order to prove 
a property P depending on a path p, it is enough to prove it only in the case 
where p is refl. This constitutes the induction principle for equalities. A similar 
induction principle can be shown for equivalences: 


=-ind : V {i j} (P_ : {AB : Type i} 7 (A = B) > Type j) > 
({A : Type i} 7 P (=-refl {A = A})) 7 
{A B : Type i} (e : A=B)7Pe 
=-ind {i} P r {A} {B} e = 
transport P (=-e ua-equiv e) (lem (ua e)) 
where 
lem : (p : A =B) > P (id-to-equiv p) 
lem refl =r 


9.4.9 Function extensionality. We have already seen in section 9.4.5 that, 
given types A and B, and elements x and wv’ in A and y and y’ in B, we have 


Idx B((2, y), (2, y’)) = Ida(z, y) x Ida(2’, y') 


An equality in the product is thus the same as an equality in each of the com- 
ponents. Now, we have seen in section 9.4.4 that a product is a particular case 
of a dependent function 


(A x B) = (II(6: Bool).d4,p b) 


and we therefore expect that the above characterization of paths in products 
generalizes to dependent functions. 
More precisely, we expect that for every functions f,g : I(x: A).B, we have 


Idi(e:a).B(f,9) = (f ~ 9) 


i.e. the two functions f and g are equal when we have fx = gw for every 
element x of A. While we will see that this is true, the proof performed for 
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products above does not generalize easily. Namely, our first hope is to prove 
this identity using univalence, by showing an equivalence between the two types. 
However, this is not easy. Constructing a map from left to right is not a problem: 
the function happly defined in section 9.4.1 provides us with such a function 


Idq(a:a).B(f, 9) oe (f as 9) 


However, constructing a map in the other direction 


(f aa 9) => Idqyz:a).B(S; 9) 


is much more difficult. It is called function extensionality and corresponds to 
the DFE axiom we have seen in section 9.1.5. Intuitively, it can be proved as 
follows. The maps f and g being functions, they are thus of the form f = Aw. f’ 
and g = Azx.g’. Moreover, for every x, we have a path pz : fx = ga, and thus 
f' =4q’. By induction on this path, we can suppose that f’ and g’ are the same, 
in which case we can conclude that the two functions are equal by reflexivity. 
So, it seems that this could be proved using the following Agda code: 


hcontr : V {i j} {A : Type i} {B : A + Type j} 
(fg: &«:AA7Bx 7f ~grf=zg 

hcontr (A x 7 f') (A x 7 g') h with h x 

hcontr (A x 7 f') (A x > .f') | refl = refl 


Unfortunately, this code is not accepted by Agda: it does not allow pattern 
matching on functions (for good reasons) and we use an “arbitrary variable” x 
when matching on h x, which is not valid either. 


General approach. Instead, the trick is to show the equality for all pairs of 
functions f and g at once, i.e. show 


U(f :M(@: A).B).X(g: M(a : A).B).Idnwsa).8(f9) 
S(f :W(a : A).B).5(g : We: A).B).f ~ 9 
We are thus lead to consider the type 
Path(A) = X(a: A).X(y: A). Ida4(a, y) 
of all paths in a type A and the type 
Homotopy(A, B) = H(f : I(a#: A).B).X(g: (a: A).B).f ~ g 


of all homotopies between functions from a type A to a type B. A homotopy 
between a function f and a function g, is a function which to every x in A 
associates a path between f(x) and g(x), so that the type of all homotopies 
between functions from A to B can alternatively be described as 


Homotopy(A, B) = II(# : A). Path(B(z)) 


We will adopt this definition since it leads to simpler developments. 
For every type A, we have a function Path(A) — A which associates its 
source to a path and a function A — Path(A) which constructs the constant 
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path on an element of A. These two can be shown to form an equivalence, 
i.e. we have 


Path(A) ~ A 


We therefore have the following sequence of equivalences 
Homotopy(A, B) = II(#: A). Path(B) ~ I(a: A).B ~ Path(II(a : A).B) 
and in particular, we have a map 
funext : Homotopy(A, B) > Path(II(a : A).B) 


which witnesses the extensionality of functions. At least, this is the general 
plan: if we look at this proof in details, there are some problems with it. 

Firstly, the map funext above associates to each homotopy h between func- 
tions f and g a path funext(h), but we have not shown yet that this path 
actually also is between f and g and not some other functions. Here is how 
we are going to prove it. In the type Homotopy(A, B), apart from h, there is 
another notable homotopy, which we write hg here: the “constant” homotopy 
between f and itself. It can be shown that funext(ho) = funext(h), the proof 
being just reflexivity, and therefore ho = h because funext is an equivalence, 
and as such is injective. We have an equality between a homotopy f ~ f 
and a homotopy f ~ g: by projecting on the target endpoint, we can deduce 
that f = g. 

Secondly, the middle equivalence 


II(a# : A). Path(B) ~ II(a#: A).B 


is easy to prove only when B does not depend on xz. Namely, we have an 
equivalence Path(B) ~ B: if B does not depend on zx, we can simply apply 
the function AB.(A > B) to it in order to obtain the desired equivalence. If B 
depends on 2, there is no easy way to proceed, at least if we do not suppose 
function extensionality, which is precisely what we are trying to prove. The 
plan will thus be to proceed in three steps: 


1. show function extensionality in the non-dependent case as above, 
2. use it to deduce another property called weak function extensionality, 
3. use weak function extensionality to deduce dependent function extension- 
ality. 
Paths. Let us first define the type Path(A) of all paths in a type A, as well as 
simple helper functions. This type can be formalized in Agda as 


Path : V {i} + (A: Type i) > Type i 
PathA=ZLZA(AXx4ZA (AyYAxX ez Yy)) 


We can define a function which to a path associates its source: 


Path-src : V {i} {A : Type i} + PathAvwaA 
Path-sre (x , y , p) = x 


and its target: 
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Path-tgt : V {i} {A : Type i} + PathAvaA 
Path=tet (xcy Wego PS 


so that each path induces an equality from its source to its target: 


Path-= : V {i} {A : Type i} 7 (p : Path A) 7 
Path-src p = Path-tgt p 
Path-= (x , y , Pp) =P 


Every identity can be seen as a path: 

Path-of : V {i} {A : Type i} {x y : A} 4 (p: x = y) 7 Path A 
Path-of {x = x} fy=y} P=X,yY,P 

and we can easily construct constant paths: 


Path-cst : V {i} {A : Type i} 7 A 7 Path A 
Path-cst x = Path-of (refl {x = x}) 


Since every element of Path A consists of a path x = y, we can “contract” each 
of its elements to its source x and show the equivalence that we have already 
mentioned: 


Path-contract : V {i} {A : Type i} + PathA=A 
Path-contract 


(Path-src , 
(Path-cst , A { (_, _, refl) 7 refl }) , 
(Path-cst , A _ 7 refl)) 


Homotopies. We now define the type Homotopy(A, B) of all homotopies be- 
tween dependent functions from A to B: 


Homotopy : V {i j} (A: Type i) (B : A 7 Type j) 7 Type (lmax i j) 
Homotopy A B = (x : A) > Path (B x) 


The source function of the homotopy can be recovered by 


Homotopy-src : V {i j} {A : Type i} {B : A 7 Type j} > 
Homotopy A B + (x : A) 7B x 
Homotopy-src h x = Path-src (h x) 


and similarly for its target 


Homotopy-tgt : V {i j} {A : Type i} {B : A 7+ Type j} > 
Homotopy A B + (x : A) 7B x 
Homotopy-tgt h x = Path-tgt (h x) 


so that every homotopy induces a homotopy in the previous sense between its 
source and its target: 


Homotopy-~ : V {i j} {A : Type i} {B : A 7 Type j} 
(h : Homotopy A B) ~ Homotopy-src h ~ Homotopy-tgt h 
Homotopy-~ h x = Path-= (h x) 


We can see a homotopy between two given functions as an element of this type 
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Homotopy-of : V {i j} {A : Type i} {B : A 7 Type j} 
{f g : (x : A) 7B x} > f ~ g + Homotopy A B 
Homotopy-of h x = Path-of (h x) 


and given a function f : A > B, we can construct the constant homotopy f ~ f: 


Homotopy-cst : V {i j} {A : Type i} {B : A + Type j} > 
((x : A) 4 B x) 7 Homotopy A B 
Homotopy-cst f = Homotopy-of (A x > refl {x = f x}) 


Non-dependent function extensionality. Finally, we can show the promised equiv- 
alence. As explained above, we restrict here to the non-dependent case, where B 
does not depend on z: 


Homotopy-=-Path : V {i j} {A : Type i} {B : Type j} > 
Homotopy A (A _ > B) = Path (A > B) 
Homotopy-=-Path {i} {j} {A} {B} = 
(Homotopy A (A _ > B)) =( =-refl ) 


(A + Path B) =( =-to Path-contract ) 
(A > B) =( =-sym Path-contract ) 
Path (A 7 B) “7 


where the function =-to is detailed below. From there, the non-dependent 
function extensionality is easily deduced, its type being 


FE {i} {j} = {A : Type i} {B : Type j} > {f g: A> B}> 
(x : Arf xFgxrfzg 


We can proceed as explained before, by considering the constant homotopy ho on 
f and the homotopy h between f and g, showing that they have the same image 
under the function =-+ Homotopy-=-Path (the proof is simply refl because we 
have carefully defined =-to, see below), deducing by injectivity that ho = h and 
deducing that f = g by projecting on the respective targets of ho and h. 


funext-nd : V {i j} + FE {i} {j} 
funext-nd {A = A} {B = B} {f = f} {g = g} h= 
ap (A h x > Homotopy-tgt h x) p 


where 
p : Homotopy-cst f = Homotopy-of h 
p = =-inj Homotopy-=-Path refl 


Functions to equivalent types. The core of the series of equivalences proving 
Homotopy-=-Path is the function =-to which allows deducing 


(A B)~ (A= B’) 


from 
Bers 


This is actually the only place where the univalence axiom is used. Since we 
have application of functions to equivalences, this is actually pretty easy to 
define: 
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=-to : V {i j} 7 {A : Type i} + {B B' : Type j} > 
B= B' + (A 7B) = (A> B') 
=-to {A = A} e = =-ap (AB7AB)e 


Given a function f : B — B’ which is an equivalence, we have “no control” over 
the function (A > B) > (A > B’) which is the produced equivalence, which 
complicates the proofs. However, there would be a natural candidate, namely 
the function 

Agu. f(gxz): (A> B) > (A B’) 


It simplifies much the proofs if we enforce this choice. This can be done by 
defining instead: 


=-to : V {i j} 7 {A.: Type i} ~ {B B' : Type j} > 
B= B' + (A 7B) = (A > B') 
SAtO-40) 4a CA} ERY (By eS tT x Sea ey Cr x)? 2 Lene 
where 
lem : {B B' : Type j} (e : B= B') > 
isEquiv (A (f : A 7B) x > (*-7 e) (Cf x)) 
lem = =-ind 
(A {B} e 7 isEquiv (A (f : A 7B) x > (= e) Cf x))) 
(A {B} 7 snd (=-refl {A = A > B})) 


Weak function extensionality. In order to generalize function extensionality to 
dependent types, we will first show another principle called weak function exten- 
sionality, which states that a product of contractible types is itself contractible. 
It can also be seen as a degenerated form of axiom of choice where the family 
of types we consider consists of types containing exactly one element (up to 
homotopy). Formally, it can be stated as follows: 


WFE {i} {j} = {A : Type i} {B : A 7+ Type j} > 
((x : A) 7 isContr (B x)) + isContr ((x : A) 7B x) 


Let us first explain why the “obvious proof” does not work. Suppose given a 
family of contractible types B(x) indexed by x in A: for each 2, there is an 
element b, in B(x) and a path p¥ : b, = y for every y in B. We are therefore 
tempted to prove that II(#: A).B can be contracted on to \x.b;. To show that 
this is the case, we have to construct, for every function f in (x2: A).B a path 
Ax.by = f. Since, we have the paths ph : by = f(x) we are almost there, 
but we cannot conclude since this would require function extensionality, which 
is precisely what we are trying to prove... 

The actual proof uses (non-dependent) function extensionality. Suppose 
given a family of contractible types B(a) indexed by x in A. Each B(x) being 
contractible, we have B(x) ~ T, and thus B(x) = T by univalence. Therefore, 
B= X«.7 by function extensionality. By transport, instead of showing that 
the type II(a : A).B is contractible we are left with showing that the type 
II(a : A).T is contractible, which is easy: it can be contracted to Ax. tt. 


wfunext : V {i j} 7 WFE {i} {j} 

wfunext {A = A} {B = B} c = 
transport (A B 7 isContr ((x : A) +B x)) (sym p) contr 
where 
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p:Bz= (A _°2 Lift T) 

p = funext-nd (A x + ua (Contr-=-Lift-T (c x))) 

contr : V {i} + isContr ((x : A) 7 Lift {i} T) 

contr = (A x > lift tt) , (A f + funext-nd (A x > refl1)) 


Function extensionality. We can finally prove the (dependent) function exten- 
sionality, whose type is 


DFE {i} {j} = 
{A : Type i} {B : A 7 Type j} 7 {f g : (x : A) 7B x} > 
(x :A7f x=zgxr7f zg 


Suppose given two dependent functions f and g of type I(x: A).B, which 
are homotopic (i.e. we have f ~ g). Up to some minor details, those functions 
can be seen as elements of type 


I(x : A).U(y: B).Idp (f(x), y) 


which we respectively call f’ and g’, the definition of the latter using the fact 
that we have a homotopy. Recall that the type U(y : B).Idg(f(x),y), is what 
we called the singleton at f(x) and is contractible, see section 9.3.3; therefore, by 
weak function extensionality, the above type is also contractible. The functions 
f’ and g’ being elements of a contractible type, they are necessarily equal, from 
which one easily deduces that f and g are equal. 


funext : V {i j} 7 DFE {i} {j} 
funext {A = A} {B = B} {f = f} {g = g} p= 
ap (A f x 9 fst (f x)) p' 
where 
f' : (x : A) + Singleton (f x) 
f' x =fx, refl 
g' : (x : A) > Singleton (f x) 
X=g2xX,px 
contr : isContr ((x : A) 7+ Singleton (f x)) 
contr = wfunext (A x 7 Singleton-isContr (f x)) 
p' : f' =g' 


p' = Contr-isProp contr f' g 


The above proof does not use univalence and therefore, without univalence, 
WFE implies DFE. The converse also holds, as explained above, 


DFE-to-WFE : V {i j} + DFE {i} {j} 7 WFE {i} {j} 
DFE-to-WFE funext c = 
(A x 7 fst (c x)) , A f > funext (A x + snd (c x) (Cf x)) 


so that WFE and DFE are equivalent, even without assuming univalence. 
9.4.10 Propositional extensionality. Recall from section 9.3.1 that the propo- 


sitional extensionality axiom states that two logically equivalent propositions A 
and B are equal: 


PE : V {i} + Type (lsuc i) 
PE {i} = V {AB : Type i} 7 isProp A 7 isProp Bt A#B7AAZ=B 
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This is intuitively justified because, since A and B are both propositions they 
are either empty or a point, and since they are equivalent they are both empty 
or both non-empty. We show here that this principle follows from univalence. 
Namely, two logically equivalent propositions A and B are equivalent: the log- 
ical equivalence provides functions f : A > B and g: B —> A and we have 
go f(a) =a and f og(y) = y for every x in A and y in B because A and B are 
propositions (and thus any two elements are equal). 


#-to-= : V {i} {AB : Type i} > 
isProp A + isPropB 7 A#B37A=B 
#-to-= PA PB (f , g) = 
f, 
Cp AK OCS PAS CE EX) 3) g 
(g , (A x + PB Cf (g x)) x)) 


Finally, univalence provides us with the required equality: 


propext : V {i} 7 PE {i} 
propext PA PB e = ua (#-to-= PA PB e) 


By transport, this means that given two equivalent propositions, one can be 
substituted for the other. We have already encountered an instance of this in 
lemma 2.2.9.1. 


9.5 Higher inductive types 


We have seen in section 9.4.7 that, if we assume the axiom of univalence, we can 
exhibit a type which is non-trivial, in the sense that it is not a set. However, we 
cannot easily construct a type which corresponds to a space we have in mind. In 
particular, we have mentioned in section 9.3.2 that all the usual (inductive) types 
are sets (e.g. natural numbers, lists of elements of a set, etc.). Higher inductive 
types are a generalization of inductive types that allow for constructing useful 
types, which are typically not sets. The presentation given here is very brief and 
the reader is invited to read [Unil3, chapter 6] for a more detailed presentation, 
as well as [CCHM16] for a technical description of the theory behind the current 
implementation in Agda. 


9.5.1 Rules for higher types. In order to introduce types corresponding to 
spaces of interest, one way to proceed consists in adding new constructors and 
rules, as in section 8.3. We present this approach here. 


The interval type. As a first example consider the interval space 


path 


beg end 


This type is of course a set (and even a contractible type), but the approach will 
generalize to types which are not. The corresponding type, that we are going 
to write I, can be thought of as freely generated by two points beg and end, as 
well as a path path : beg = end, as figured above, which suggests the following 
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rules. The formation rule states that I is a well-formed type in any well-formed 


context 


[TF (Ip) 
TEI: Type 7 


The introduction rules states that beg and end are elements of the interval and 
that path is a path between them: 


De ache BS om Tr 
Fever” fear TF path : Idy(beg, end) 


(p") 


The elimination rule is more subtle. What do we need in order to determine a 
function from I to an arbitrary type A? In the case where A does not depend 
on I, this is easy: we need two elements b and e of A (the respective images 
of beg and end), as well as a path p from b to e (the image of path). The 
corresponding rule should thus be 


Trt:I [ThKb:A Tke:A TF p:Id,4(b,e) 
Tr rec(t,2 + A,b,e,p):A 


(In) 


where rec(t,z +» A, b,e, p) can be thought of as the image of an arbitrary point t 
of I when the path path is sent to p. As usual, we want to formulate this 
elimination rule in the more general case where A depends on I, i.e. has a free 
variable x of type I. We now expect b to be of type Albeg /z] and e of type 
Alend /z], and now we are facing a problem: we cannot state anymore that 
the path p should go from b to e, because b and e do not live in the same 
type anymore! A way to overcome this problem, and be able to compare the 
two points, consists in transporting the point b along path, see section 9.4.1, 
in order to obtain a point b’ in Afend /az] and then require the path p to lie 
between 0b’ and e. 


A(beg) _ 


beg path end 


The resulting dependent elimination rule is then 
TFit:I T,2:IbF A: Type 
['- b: Albeg /z] TF e: Alend /a] TF p: Idajend /a|(0', e) 
Tk rec(t, 2 +> A, b,e,p) : Alt/z] 


(In) 


where b’ is a shorthand for transport(A, path, b). The computation rules state 
that when we apply the elimination rule in the case where t is beg, end and 


CHAPTER 9. HOMOTOPY TYPE THEORY 460 


path, we recover b, e and p respectively: 


T,2:It A: Type 
[TF b: Albeg /z] [TF e: Alend /2] TF p: Idafena /a}(0’, e) 


bes 
Tf rec(beg, 2 +> A,b,e,p) = b: Albeg /z] (e") 
T,a2:IbF A: Type 
Tb: Albeg /a] TF e: Alend /2] DE p: Idafena /xj (0', €) sa 
TF rec(end, «+ A,b,e,p) =e: Afend /z] ue) 
T,a2:IF A: Type 
Tb b: Albeg /a] TF e: Alend /a] PE pildainaiae se) (pst) 
c 


Tt apd(rec(—, 2 +4 A, b,e,p), path) = p: Id ajena /zj(0', €) 


We do not include a uniqueness rule because it can be shown to hold proposi- 
tionally (this is detailed in section 9.5.3 in the case of the circle type). 


The circle type. A type Circle corresponding to the circle can easily be imple- 
mented, if we think of the circle as being freely generated by a point, that we 
call base, and a path loop : base = base: 


base loop 


In other words, it is the above interval type, where the beginning and end point 
have been identified. 
The formation rule states that Circle is a well-formed type in a well-formed 
context: 
Ter 


[+ Circle : Type 


(Circler) 


The introduction rules allow typing the point base and the path loop: 


P F . base P F . loo 
(Circle;**°) (Circle;°°?) 


T+ base : Circle T+ loop : Idcircie(base, base) 


The elimination rule states that an application from the circle Circle into an 
arbitrary type A is determined by a point b of A (the image of base) and a 
path p (which determines the image of loop, as explained above): 


[TF ¢: Circle 
T,x:IF A: Type [TF b: Albase /z] Pp Id Aipase jay (0 se) 
Tk rec(t,2 ++ A,b,e,p) : A[t/z] 


(Circleg) 


where b’ is a shorthand for transport(A, loop, b). The computation rules are left 
to the reader. The reader should get convinced that we could write the rules 
for the type corresponding to the usual low dimensional spaces in this way. 
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Exercise 9.5.1.1. This is not the only way of implementing the circle. For in- 
stance, formalize the type corresponding to the following description of the 


sphere: 
x y 


i.e. freely generated by two points x and y and two paths p and q. 
Exercise 9.5.1.2. Write down the rules for the type corresponding to the sphere. 


9.5.2 Paths over. As noted above, when writing the elimination rule of types 
involving paths as constructors one needs to compare elements (say, b and e) of 
distinct types (say, A[beg /z] and A[end /a]), and the way we used to proceed 
consisted in transporting the first along p into b’, so that it lies in the same type 
as the second. Here, a path between b’ and e can be thought of as representing a 
path between b and e, i.e. as a way of comparing two elements which do not live 
in the same type. This is similar to what we have done in section 6.6.9 when 
defining heterogeneous equality, although we have to be more precise about 
equalities here. 

Given a path p: x = y in a type A, a dependent type B : A + Type, 
and two elements t : B(x) and u: B(y), we write t =? u for the type of paths 
over p between t and u. This intuitively corresponds to the collection of paths 
between t and u whose projection onto A gives the path p: 


As indicated above, this type can be defined using transport 


PathOver : V {i j} {A : Type i} (B : A ~ Type j) {x y : A} 
(Pp: x = y) (t : Bx) (u: By) > Type j 
PathOver B p t u = (transport B p t) = u 


although it is maybe clearer (and closer to the definition of heterogeneous equal- 
ity, see section 6.6.9) to define it by induction on the path p: 


PathOver : V {i j} {A : Type i} (B : A 7 Type j) {x y : A} 
(p : x = y) (t : Bx) (u: By) > Type j 
PathOver B refl t u = (t = u) 


It is convenient to introduce the following notation 
syntax PathOver Bp tu=t=u[Bildp] 


which allows writing in Agda 
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t 


MW 


uLBtlp ] 


what we have been writing t = u earlier. This new definition could be used 
to simplify the types of functions in various places. For instance, the function 
apd, see section 9.4.1, could be defined as 


apd : V {i j} {A : Type i} {B : A > Type j} (f : (a: A) 7B a) 
{x y:A}7(p:xFyrfxefy lL Blip] 
apd f refl = refl 


9.5.3 The circle as a higher inductive type. As usual in Agda, instead of 
implementing types of interest one by one, we expect that they are particular 
cases of inductive types. For instance, the circle being generated by a point and 
a path, we expect that it can be described by the following inductive type: 


data Circle : Typeg where 
base : Circle 
loop : base = base 


If you try this at home, of course Agda will reject it: as we have seen in sec- 
tion 8.4, all the constructors defining an inductive type A should have A as 
target, but here the type of the constructor loop is base = base, i.e. an equal- 
ity between elements of Circle i, not an element of Circle i (unlike base 
for instance). Higher inductive types are a generalization of inductive types al- 
lowing constructors of equalities between elements of the type. Defining those 
properly is out of scope here, we will only try to give some examples of those. 
An extension of Agda, called cubical Agda, allows for trying them by beginning 
our files with 


{-# OPTIONS --cubical #-} 
and importing the dedicated library 
open import Cubical.Core.Everything 


This allows in particular for the above definition to be accepted by Agda. From 
there, we can show the recursion principle associated to the circle type: 


Circle-rec : V {i} {A : Type i} (b : A) (p: b= b) 7 Circle 7A 
Circle-rec b p base = b 
Circle-rec b p (loop 1) =ptu 


It corresponds to the elimination rule and, as explained before, formalizes the 
fact that a map from the circle to an arbitrary type A is determined by a point b 
of A (the image of base) and a path p: b = b (the image of loop). As it can 
be observed above, when we perform pattern matching on an element of the 
circle, Agda generates two cases: this element is either the point base or a point 
loop t in the loop path. Here, the variable t can be thought of as indexing 
the position where we are in the path loop: you can think of t as being a real 
number between 0 and 1 such that loop @ (resp. loop 1) is the start (resp. end) 
of the loop, although we will not need to understand precisely what this variable 
precisely means here. The induction principle, which is the dependent variant 
of the above can also be proved in the same way: 


CHAPTER 9. HOMOTOPY TYPE THEORY 463 


Circle-ind : V {i} {A : Circle + Type i} (b : A base) 

(p : b= bE A+ loop ]) (« : Circle) + A x 
Circle-ind b p base = b 
Circle-ind b p (loop 1) =ptu 


We have indicated that the uniqueness rule could be derived propositionally: if 
two maps f and g from the circle to some type A have the same (i.e. proposi- 
tionally equal) image of the base and the same image of the loops then they are 
equal: 


Circle-unique : V {i} {A : Type i} 7 (f g : Circle + A) > 
(p : f base = g base) > 
ap f loop = ap g loop [ (Ax 7 x=x)+p]- 
(x : Circle) + f x = g x 

Circle-unique f g p q base = p 

Circle-unique f g pq (loop t) t' =qu't 


Exercise 9.5.3.1. Show that a map from the circle to a type A is the same as a 
loop in A, i.e. a path p: x =x for some point x of A: 


Circle-path : 
V {i} {A : Type i} 7 (Circle i 7A) =ZA (A x > x = x) 


Exercise 9.5.3.2. Define the circle as a type Circle’ freely generated by two 
points and two paths between them, as explained in exercise 9.5.1.1. Show that 
the types Circle and Circle’ are equivalent and thus equal by univalence. 


The loop space of the circle. As an illustration of the use of this type and its 
elimination principle, let us show a fundamental theorem of homotopy theory, 
the fact that the type base = base consisting of equalities from base to itself, 
or loops, is equivalent to Z (and thus equal by univalence). Namely, those paths 
are characterized by the number of times they turn around the circle, the sign 
encoding the direction of the loops. The proof follows the technique already 
encountered in section 9.4.5 and is detailed in [Unil3, Section 8.1]: we are going 
to show that we can encode the paths as elements of Z, as well as provide an 
inverse decoding function. For reasons of “continuity”, we cannot reason only 
on loops, and actually have to reason on all paths of the form base = x for an 
arbitrary element «x of the circle. 

We first define a function code, which to every point x of the circle associates 
a type in which we can encode paths base = z: 


code : Circle + Set 
code = Circle-rec Typeg Z (ua suc-*) 


(we recall that the type Z of integers was defined in section 6.4.9). The base 
point is sent to Z for the reason explained above, and the circle is sent to the 
path Z = Z induced by the successor function on Z, which is an equivalence 
(with predecessor as inverse function). Namely, following the loop of the circle 
adds one to the number of loops of a path, and indeed, we have that transporting 
an integer along the loop corresponds to taking its successor: 


transport-loop : (n : Z) 7 transport code loop n = suc n 
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Geometrically, the picture to have in mind is an helix standing above a circle: 


‘code 


<i aaa Circle 


The function code sends each point of the circle to the set of points above it, 
which is isomorphic to Z, and transporting an integer along the loop sends it to 
its successor. 

We can encode the paths from the base point as elements of this type by 
transporting 0 along the path: 


enc : (x : Circle) + base = x ~ code x 
enc x p = transport code p zero 


Conversely, we can decode an integer as a path by the function 


dec : (x : Circle) + code x > base = x 
dec = Circle-ind 

(A x + code x > base = x) 

loops 

transport-loop-loops 


which is defined by induction on the circle. For the base case, we send an 
integer n to the loop of the circle concatenated n times with itself (and taking 
the inverse when n is negative): this path is defined by induction on n by the 
function 


loops : Z 7 base = base 

loops (pos N.zero) refl 

loops (pos (N.suc n)) loops (pos n) - loop 
loops (negsuc N.zero) = ! loop 

loops (negsuc (N.suc n)) = loops (negsuc n) - ! loop 


For the loop case, we have to show that this function is invariant under transport 
around loop: 


transport-loop-loops : 
transport (A x 7 code x 7 base = x) loop loops = loops 


Finally, we can show that the two functions are mutually inverse. This is purely 
formal on one direction: 


dec-enc : (x : Circle) (p : base = x) ~ dec x (enc x p) =p 
dec-enc .base refl = refl 


On the other direction, this can be shown by induction on the circle: 
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enc-dec : (x : Circle) (n : code x) 7 enc x (dec x n) =n 
enc-dec = Circle-ind 
(A x + (n : code x) > enc x (dec x n) = n) 


(An- 
enc base (dec base n) =( refl ) 
transport code (loops n) zero =( transport-loops n zero ) 
n + zero =( +-unit-r n ) 
n B) 


(funext (A n 7 Z-isSet _ _ _ _)) 


where 
transport-loops : (mn: Z) + transport code (loops m) n=m+on 


is a generalization of transport-loop obtained by induction, +-unit-r is a proof 
that addition admits 0 as neutral element on the right, and Z-isSet is a proof 
that Z is a set (which follows from the decidability of equality by Hedberg 
theorem, see section 9.3.2). 


9.5.4 Useful higher inductive types. In order to further illustrate the use 
of higher inductive types, we briefly present two quite useful ones: suspension 
and propositional truncation. 


Suspension. The suspension A of a space A is the space obtained from A by 
adding two new points N and S (for “north” and “south”, these two points 
being thought of as respectively lying above and below the original space <A), 
as well as a path going from N to S passing by «x for each point x of A. For 
instance, starting from the space consisting of a point and segment figured on 
the left, we obtain the space on the right: 


N, 


S 


In particular, if iteratively apply this suspension operation starting from the 
empty space, we obtain the spheres: 


ae e 
wa 6 
S 
°¢ ia) 70 3) 
More precisely, the n-sphere is the (n+1)-th suspension of the empty space (the 


empty space could thus be considered as a good notion of (—1)-sphere). In 
Agda, we can define the suspension of a type as the higher inductive type 
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data Susp {i} (A : Type i) : Type i where 
N : Susp A 
S : Susp A 
p: & :A7N2S 


and the function which to a natural number n associates the n-sphere by 


Sphere : N > Typeg 
Sphere zero = Susp L 
Sphere (suc n) = Susp (Sphere n) 


Propositional truncation. The propositional truncation operation introduced in 
section 9.3.4 can also be defined as a higher inductive type. In order to do so, 
we should recall that the propositional truncation ||A|| of a type A is the type 
obtained from A by turning it into a proposition, i.e. by formally adding a path 
between any pair of points. This suggests the following definition as a higher 
inductive type: 


data |l_ll {i} (A : Type i) : Type i where 
|_| : AFA Ii 
I|-isProp : (xy: || AID 7x zy 


The first constructor (|_|) states that any point of A is a point of ||A||, and 
the second one (||||-isProp) adds all the required paths. The resulting type 
is trivially a proposition by ||||-isProp and the associated recursion principle, 
which corresponds to the elimination rule (||||), can be shown as follows: 


Il-rec : V {i j} {A : Type i} {B : Type j} 7 
isProp B 7 (A> B) > || A || 7 B 
lIII-rec PB f |x | =f x 
\|I|-rec PB f (|[l|-isProp x y t) = 
PB (|lI|-rec PB f x) (C[lll-rec PB f y) t 


It can, for instance, be used to construct the canonical map ||A|| + —A for an 
arbitrary type A described in section 9.3.4: 


III--> : V {i} {A : Type i} 7 |] A Il 9 + GA) 
\IIl--- = |[|-rec --isProp (A x f 7 f x) 
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Appendix 


A.1 Relations 


A.1.1 Definition. Given a set Aa relation Ron Ais asubset RC Ax A. We 
sometimes write a Rb when (a,b) € R. It is 


reflexive if a Ra for every a€ A, 


| 


transitive if a Rc for every a,c € A such that there exists b € A for which 
aRband bRe, 


— symmetric if b Ra for every a,b € A such that a KR, 
— antisymmetric if a Rb and b Ra implies a = b. 


A preorder is a reflexive and transitive relation. A partial order is a reflexive, 
transitive and antisymmetric relation. An equivalence relation is a relation 
which is reflexive, transitive and symmetric. 


A.1.2 Closure. We suppose fixed a relation R on A. Its reflexive (resp. tran- 
sitive, resp. symmetric) closure is the smallest reflexive (resp. ...) relation con- 
taining R. It always exists since it can be shown to be the intersection of all 
reflexive (resp. ...) relations containing R. Concretely, 


— the reflexive closure of R is 


RU {(a,a) | ae A} 


— the transitive closure of R is 


RU {(d0,@n) | n > 0, (ao, a1) € R, (a1, a2) € R,...,(A@n—1,a4n) € R} 


— the symmetric closure of R is 


RU{(b,a) | (a,b) € R} 


The following characterization is often useful (and similar results hold for other 
closure operations): 


Lemma A.1.2.1. The reflexive and transitive closure R* of a relation R on a 
set A is the smallest subset of A such that 


— aRa for every a € A, 


— aR c for every a,c € A such that there exists b € A for which a Rb and 
bR*c. 
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A.1.3 Quotient. An equivalence class EF under R is a subset E C A such 
that for every a € A and b € E such that (a,b) € R, we have a € E. The 
quotient A/R of A under R is the set of equivalence classes of A. 


A.1.4 Congruence. Given a function f : A" > A for some n €N, the relation 
Risa congruence for f when, given (a1,...,@n,) and (b1,...,b,) such that a; Rb; 
for every 1 <i <n, we have f(a1,...,@n) Rf(b1,...,6n). In this case, f induce 
a quotient function on A/R defined by 


f(Fa,..., En) = f(ar,...,an) 


for some (@1,...,@n) € E, X...xX Ey: this function can be shown not to depend 
on the choice of (@1,...,@n). 


A.2 Monoids 


A.2.1 Definition. A monoid (M,-,1) is a set M equipped with 
— a function -_:Mx M—- M called multiplication, 
— an element 1 € M called unit, 
such that for every elements u,v,w € M we have 
(u-v)-w=u-(v-w) l-u=u=u-l 
Such a monoid is 
— commutative when u-v = v-u for every u,v € M, 
— idempotent when u-u =u for every u € M. 


A morphism f from a monoid (M,-yz, 1,7) to a monoid (N,-n,1y) is a function 
f:M—N such that 


f(u-m v) = f(u)-w flv) f(1m) = fw) 


A.2.2 Free monoids. Given a set A, we write (A*,-,1) for the monoid such 
that A* is the set of words on A, i.e. finite sequences a, ...a, of elements of A, 
multiplication is concatenation, i.e. 


(A, ..-Qn) + (by... bm) = Ay... Andy... Bm 


and unit 1 is the empty sequence. We write |a,...an| =n for the length of a 
word. 


Proposition A.2.2.1. The monoid (A*,-,1) is the free monoid on A: given a 
monoid (M,-,1) and a function f : A + M, there exists a unique morphism of 
monoids f such that f(a) = f(a) for every a € A. 


Given a set A, we define in appendix A.3.5 below the set A* of all multisets 
on A. It is a monoid when equipped with disjoint union W as multiplication and 
empty multiset @ as unit. 
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Proposition A.2.2.2. The monoid (A*,W,@) is the free commutative monoid 
on A: given a commutative monoid (M,-,1) and a function f : A + M, there 
exists a unique morphism of monoids f such that f(a) = f(a) for every a € A. 


Given a set A, we write P(A) for the set of subsets of A. It is a monoid when 
equipped with union U as multiplication and empty set @ as unit. 


Proposition A.2.2.3. The monoid (P(A),U,9) is the free idempotent commu- 
tative monoid on A: given an idempotent commutative monoid (M,-,1) and a 
function f : A > M, there exists a unique morphism of monoids f such that 


f(a) = f(a) for every a € A. 


A.3 Well-founded orders 


A.3.1 Partial orders. A partially ordered set or poset (A, <) is a set A equip- 
ped with a relation <, called partial order which is reflexive, transitive and 
antisymmetric (see also appendix A.1). A partial order is total when for every 
a,b € A we have either a < bor b< a. 


A.3.2 Well-founded orders. A poset is well-founded when there is no strictly 
decreasing infinite sequence 


ag > ay >ag>... 
This is equivalent to requiring that every infinite weakly decreasing sequence 
agp 2a, 2022... 


is eventually stationary 


dn Ee NW Ee N.(é = n) => (a; = Qi+1) 


A chain in A is a totally ordered subset of A. It is ascending when it has a 
minimal element and descending when it has a maximal element. A well-founded 
poset is thus a poset in which every descending chain is finite. 

Well-founded orders are particularly interesting because they satisfy the fol- 
lowing induction principle: 
Theorem A.3.2.1 (Well-founded induction). Suppose given a property P(a) on 
the elements a of a well-founded poset (A, <). Suppose moreover that for every 
element a € A, if P(b) holds for every element b < a then P(a) holds. Then 
P(a) holds for every element a of A. 


Proof. By contradiction, suppose that there exists an element ag € A such that 
P(ag) does not hold. By hypothesis, this means that there is an element a; < ao 
such that P(a,) does not hold. By the same reasoning applied to a1, we can 
construct an element az < a; such that P(a2) does not hold. Iterating this 
reasoning, we construct an infinite sequence 


ag > ay >ag>... 


of elements a; such that a; > aj, and P(a;) does not hold. Since (A, <) is 
well-founded, such a sequence cannot exist. 
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Remark A.3.2.2. The above proof does not exploit the fact that < is transitive 
nor antisymmetric, and the reasoning would in fact hold for any relation R in 
place of <. A relation R on a set A is well-founded if there is no infinite sequence 
of elements a; of A such that 


ap Ra, Rak... 


An induction principle similar to theorem A.3.2.1 holds for such relations. 


The prototypical well-founded order is the subterm order. Suppose fixed a 
signature 1, see section 5.1.1. We define the subterm order < on the terms in 
this signature by u < t whenever wu is a subterm of t, see section 5.1.2. 


Lemma A.3.2.3. The relation < is a partial order. 
Theorem A.3.2.4. The relation < is well-founded. 


Proof. We define the height ht(t) of a term t¢ by induction on t by 


ht(x) =0 ht(f(ti,..-,tn)) =1+ \f_ ht(d) 


l<icn 


It is easily shown that u < t implies ht(w) < ht(t). Therefore, if the subterm 
order was not well-founded, (N, <) would not be well-founded either. 


A.3.3 Lexicographic order. Given two posets (A, <4) and (B,<zs), we de- 
fine the lexicographic order < on A x B by (a,b) < (a’,b’) whenever a < a’, or 
a=a' andb<JU’. 

Lemma A.3.3.1. The relation < on A x B is a partial order. 

Lemma A.3.3.2. The partial order < is total when both <4 and <p are. 


Theorem A.3.3.3. The partial order < is well-founded when both <,4 and <p 
are. 


Proof. Suppose given an infinite sequence 
(ao, bo) > (a1, 61) > (a2, b2) wick 


By definition of >, for every index 7, we either have a; > aj+1 or 0; > bj41. The 
sets 
{i EN | a> ai+i} and {i EN | b; > bisa} 


are such that their union is N, therefore one of them must be infinite. We thus 
have an infinite strictly decreasing sequence of elements of A or of elements 
of B. This is impossible since both posets (A,<,) and (B,<g) are supposed 
to be well-founded. 


Given a well-founded poset (A, <), the lexicographic order is a well-founded 
order on A? = A x A, and we can iterate the construction in order to obtain 
a well-founded order on A”, still called lexicographic and written <jex, for any 
natural number n, using the fact that A"t! = Ax A”. Finally, we can construct 
an order < on A%*, called the deglex order, such that u < v when 


— |u| < |v] (uw is shorter than v), or 
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— |u| = |v| and wu <jex v (u is lexicographically smaller than v). 
Theorem A.3.3.4. Given a well-founded poset (A, <), the associated deglex order 
on A* is well-founded. Moreover, it is total if the order < is. 


Remark A.3.3.5. This order is different from the usual dictionary order (we 
compare the first letter of the words, then the second, and so on) which is not 
well-founded: with elements a,b € A such that a > 6, we have the infinite 
decreasing sequence 


a >ba> bba > bbba > bbbba >... 


A.3.4 Trees. A (non-planar rooted) tree is a set T equipped with a distin- 
guished element xo and a function 7: T \ {10} > T, satisfying 


Ve € T.dn € N.r"(x) = 29 


The elements of T are called the nodes of the tree and 70 is called the root node. 
Given x € T \ {xo}, T(x) is called the parent of x, and x is a child of r(x). A 
node x such that t~!(x) = @) is called a leaf. Given a node x € T, the subtree 
at x is the tree 


Tr ={y ET | dneN.r"(y) =z} 
with parent function 7, such that 7,(y) = T(y) for y # a. 
Lemma A.3.4.1. The set of nodes of a tree T satisfies 


reET—1(axo0) 


where Zo is the root of T. 


A tree is finite when its set of nodes is finite and infinite otherwise. A tree T 
is finitely branching when for every node x € T its set of children t~!(z) is 
finite. A branch of a tree is a sequence of nodes x, 71,... (finite or not) such 
that xo is the root of the tree and for every index i > 0, T(a;) = aj-1. 


Lemma A.3.4.2 (Kénig’s lemma). A finitely-branching infinite tree has an infi- 
nite branch. 


Proof. Suppose fixed a finitely-branching infinite tree T. We define an infinite 
branch (x;)ien, with the property that the subtree at «; is infinite, by induction 
on 7. We set xp to be the root of T and, supposing that 2; is defined, we 
define x;41 as follows. By hypothesis, the set t~1(2;) is finite and T is infinite. 
From lemma A.3.4.1, we deduce that there exists 7;4,; € T~'(a) such that the 
subtree at 2;+, is infinite. 


A labeled tree is a tree equipped with a function which to every node asso- 
ciates a label, which is an element of some fixed set. 
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A.3.5 Multisets. Suppose fixed a set A. A multiset is a function 
M:A>~N 


It can be thought of as a finite collection of elements of A where each element 
a € A occurs M(a) times. We thus write a € M whenever M(a) > 0. The set 
of multisets on A is written A*. 

The domain dom(M) of a multiset M is the set 


dom(M) = {a¢A| M(a) >0} 


A multiset M is finite when dom(M) is finite. We write Ae for the set of finite 
multisets over A. We write @ for the empty multiset, such that @(a) = 0 for 
every a € A. Given a € A, we write {a} for the singleton at a, which is the 


multiset such that 
(OH0~{) iteae 
Given multisets M and N on A their union M W N is defined, for a € A, by 
(MW N)(a) = M(a) + N(a) 


We write M C N whenever M(a) < N(a) for every a € A and in this case, we 
define their difference M \ N by 
(M \ N)(a) = M(a) — N(a) 


fora é A. 

Suppose that (A, <) is a poset. We define a partial order <* on A*, called 
the multiset extension of <, by M <# N whenever there exists finite multisets 
X,Y € A*® such that 


M=(N\X)WY and Vy € Ydre X.y <a 


This order is such that we get a smaller multiset by removing and element and 
replacing it with an arbitrary number of smaller elements: the elements get 
smaller and smaller, but also more and more numerous. It can still be shown 
that the resulting order is well-founded when the original one is [DM79]. 


Theorem A.3.5.1. The poset (Ane <*) is well-founded if and only if (A, <) is 
well-founded. 


Proof. The left-to-right implication is easy, we show the right-to-left implication. 
We define a relation < on A¥* by M < N when there exists a € A and a finite 
multiset Y such that 

M=(N \{as)wY 


and b < a for every b € Y. The relation <* is easily shown to be the reflexive 
and transitive closure of <. Now, by contradiction, suppose that there is an 
infinite decreasing sequence for <*. This means that there exists an infinite 
sequence 

Moe Me Moc... 


where 
Misi = (MG \ {zi}) Wi 
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for every index 7. We construct a growing sequence of trees T; labeled in AL{_L} 
as follows. To is consisting of a root and for every element a of A, we add to the 
root as many sons labeled by a as the multiplicity of a in M. The tree Ty41 is 
obtained from T; by picking an element labeled by x; and adding to it as sons 
the elements of Y; counted with multiplicities. In the case where Y; is empty 
we add a single node labeled by L. The inductive limit of this process is a tree 
Too, which is infinite because at least one node is added at each step (thus the 
special case when Y; is empty). We deduce from lemma A.3.4.2 that the tree T 
admits an infinite branch: the labels of its vertices xo, 71, x2,... form an infinite 
strictly decreasing sequence of elements of A. Contradiction. 


A.4 Cantor’s diagonal argument 


Cantor’s diagonal argument is a general method to show that two sets are not 
in bijection. For instance, suppose that we have a bijection between sequences 
of elements of N and N. By the bijection, we have an enumeration of all the 
sequences and we write (n7)ien for the j-th sequence. We can build a table 
whose columns and rows respectively correspond to i and j, and cells contain 
the nj: 


0 1 2 83 
0 0 0 
O} no ny ny ng 
1}ng ni nh nh 
2 ne ont ne” He 
3) ey DE ons 


Then pick any sequence (m;) such that, for every index 7, m,; is a natural number 
different from ni. Since we have an enumeration of all sequences, there is an 
index k such that (m;) = (n*). But we have m, 4 n%. Contradiction. There is 
thus no bijection between NN and N. 


A.4.1 A general Cantor argument. A more general form of the Cantor 
argument is the following. 


Theorem A.4.1.1. Suppose given sets A and B such that B contains at least 
two distinct elements yo and y;. Then there is no surjection from A to A— B. 


Proof. Suppose given a surjection ¢: A — (A > B). We consider the function 
f:A- B defined by 


yo otherwise. 


Given an element x € A, we have $(x)(x) # f(x) and thus ¢ is not surjective. 
Contradiction. 


A formalization of the above proof is given below. From a constructive point of 
view, it requires to be able to decide equality with yo in B. This is of course 
the case when B has decidable equality, e.g. B = N, see section 6.6.8. 
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Theorem A.4.1.2. Given sets A and B such that B contains at least two distinct 
elements yo and yi, there is no injection from A > B to A. 


Proof. Suppose given an injection 7 : (A > B) + A. We define a function 
oe: A> (A- B) by 


$(a) xt yo if there isno f: A— B such that w(f) = 2, 
ey) = 
f for some f : A— B such that w(f) = x, otherwise. 


Given f : A— B, we have, by definition, 


popow(f) =V(f) 


Thus, by injectivity of ~, 
oW(f)) = f 


and ¢ is surjective. We conclude using theorem A.4.1.2. 


Note that the above proof implicitly requires the excluded middle in order to 
construct the function ¢. It is apparently not possible to prove this theorem in 
a constructive setting [Baul1]. 


Corollary A.4.1.3. Given a set A, write P(A) for its powerset. There is no 
surjection A — P(A) and no injection P(A) — A. In particular, there is no 
bijection between A and P(A). 


Proof. Taking B = {0,1} in the previous theorems, we have P(A) ~ (A > B) 
and we conclude. 


Corollary A.4.1.4. There is no bijection between N > N and N. 


Proof. Take A = B = N in the previous theorems. 


Lemma A.4.1.5. The set P of programs (in any reasonable language) is count- 
able. 


Proof. A program is a finite sequence of characters: writing ¥ for the finite set 
of characters (e.g. the UTF-8 characters), programs are elements of &*. In other 
words, writing P for the set of programs, we have P C &*. The set © can be 
totally ordered (e.g.a<b<c<...), thus &* is totally ordered by the deglex 
order (theorem A.3.3.4) and thus P is totally ordered, as a subset of a totally 
ordered set. Given a program p € P C &”, writing n for its length, the elements 
below it belong to the finite set L];<,, x" which is finite, as a finite union of finite 
sets. We can thus associate, to every program p € P, the natural number np 
defined as the cardinal of the longest ascending chain in P with p as maximal 
element (which is finite by the previous argument). The function P > N thus 
defined is easily seen to be a bijection. 


Corollary A.4.1.6. There is a function N — N which is not computable by a 
program. 


Proof. By contradiction, suppose that this is not the case. This means that 
there is a surjection ¢: P > (N > N) and, by precomposing with the isomor- 
phism N ~ P of lemma A.4.1.5, a surjection N + (N > N). We conclude by 
theorem A.4.1.1. 


APPENDIX A. APPENDIX 475 


A.4.2 Agda formalization. We now provide a formalization of the above the- 
orem A.4.1.1. Given a function f : A— Band an element y of B, the fiber of f 
at y, also called the preimage of y under f, is the collection of elements of A 
whose image is y: 


fib : V {i} {AB : Set i} > (f : AB) 7 (Cy: B) > Set i 
fib {_} {A} fy =ZXA (Ax 7 f x = y) 


Such a function is surjective when every element of B admits a pre-image un- 
der f, i.e. there exists an element in the fiber of any point: 


surjective : V {i} {AB : Set i} (f : A7B) 7 Seti 
surjective f = V y + fib f y 


The formal proof of the theorem then follows directly from the above one. We 
suppose given two types A and B, the former containing two distinct elements 
yo and y; such that we can decide the equality to yg, and a surjective function 
gy of type A + A > B and reach an absurdity: 


no-surjection : V {i} {A B: Set i} {yo yi : B} 7 yo # yi 7 
(Cy : B) + Dec (y = yo)) 7 
(9 : A+* AB) 7 surjective 97 1 
no-surjection {_} {A} {B} {yo} {yi} yo#y: dec » surj = 
@xx#fx x (cong-app p x) 


where 

f:A7B 

f x with dec (@ x x) 
f x | yes _= yy 

f x | no _ = yg 


pxx#fx : (x : A) A7QOxx#f xX 
pxx#fx x p with dec (@ x x) 
pxx#fx x p | yes refl = ye#y; p 


xx#fx x p | no 7p =-pp 
x: A 

x = fst (surj f) 
p:oxe=f 


p = snd (surj f) 


Note that the construction of f requires us to be able to decide equality with yo 
which we also have to suppose given as argument. The proof of theorem A.4.1.2 
can also be formalized if we assume the law of excluded middle, called lem below. 
We define the predicate of being injective by 


injective : V {i} {AB : Set i} (f : A 7B) 7 Seti 
injective f=V {x x'}7f xf x' 7x =x' 


and then show the theorem by following the proof given above, which is based 
on the previous function 


no-injection : V {i} {AB : Set i} {yo yi : B} 7 yo Fy 7 
(lem : (A: Set i) ~ Dec A) > 
(p : (A > B) 7 A) > injective pr L 
no-injection {_} {A} {B} {yo} {yi} ye#yi lem p inj = 
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no-surjection yo#y; (A y > lem (y 
where 

g:A7A>B 

g x with lem (fib x) 

gx | yes (fF , p) =f 

px | no -p=A_- yo 

pep=p : CF: AFB) 7H ( (FD) FOF 
pop=p f with lem (fib » (h f)) 
pep=p f | yes (g , pP) =P 

pop=p f | no =p = L-elim (-p (f , refl)) 
surj : surjective 

surj f =o f , inj (hpp=p fF) 


Yo)) 9 surj 
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